Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IDEA] Convert types to correct ones: for(int i=0; i<vec.size(); i++) => for(size_t i=0; i<vec.size(); i++) #23

Open
SamuelMarks opened this issue Oct 10, 2021 · 6 comments

Comments

@SamuelMarks
Copy link
Contributor

Often when building third-party libraries I get a bunch of warnings "comparison between signed and unsigned types is UB".

Not every such occurrence has a trivial solution. But—in my experience—most do. Usually switching just one var from int to size_t also requires tracing all use of that var and changing all those types to size_t also.

From:

unsigned long f() {return 0;}
const size_t v = f();

int main() {
    std::vector<float> vec;
    for(int i=0; i<vec.size(); i++) {}
}

To:

unsigned long f() {return 0;}
const unsigned long v = f();

int main() {
    std::vector<float> vec;
    for(int i=0; i<vec.size(); i++) {}
}

PS: I'm aware that size_type isn't necessarily size_t… but using it here anyway. Just to reiterate: C++ is an afterthought, my main target is C.

Having this tool may aid in compiler optimisation, will reduce UB, and draw down the number of warnings.

Happy to build it myself and release it under CC0 to be compatible with your philosophy. But I'm new to all this, so would benefit from your insights/aid.

How about it? 😃

Thanks for your consideration

@banach-space
Copy link
Owner

Hey,

This would be incredibly useful, yes! Which makes me wonder - has there been any prior art? Have you tried UndefinedBehaviorSanitizer?

With this sort of tools, you want to make sure that you don't get too many false positives. Preferably none. That's the tricky part. Also, I guess that you'd want to identify the problematic cases first and then refactor them, right? As in, a proof-of-concept for this could be implemented in two steps.

Are you aware of a good code-base that could be used to showcase this?

I'm more than happy to help. I don't do much in Clang myself, so I'm not an expert. But I am always keen to learn something new! I've noticed that you have already asked on Discourse. From my experience, Clang folk mostly use cfe-dev. You could try there if there are no replies on Discourse.

All the best,
Andrzej

@SamuelMarks
Copy link
Contributor Author

Thanks Andrzej,

Yeah there are a bunch of nice santisers out there… but I haven't found any that improve the code for you in this way.

I assume there are two trivial cases which should be comparatively straightforward to implement:

  1. Convert type of variable to return type of function (function type takes precedence)… à la auto in C++11;
  2. Convert loop initialiser var—and C89 style var initialised in var but declared elsewhere—that is compared with a var of another type, to the comparison var type (first non-equal var type being compared against takes precedence).

Not sure if I try to hack something together now—with you to check—or if I continue the RFC onto cfe-dev. Oh and also the IRC channel was helpful (if you get there at the right time-of-day).

@banach-space
Copy link
Owner

So you probably want to start with something simple like this:

void foo() {
  unsigned K = 10;
  for (int k = 0; k < K; k++)
    ;
}

Try this:

$ clang -Wextra -c file.cpp
file.cpp:3:22: warning: comparison of integers of different signs: 'int' and 'unsigned int' [-Wsign-compare]
  for ( int k = 0; k < K; k++)
                   ~ ^ ~

So it is possible to analyse this with clang and identify as a problem. Now,

$ clang -cc1 -ast-dump -fcolor-diagnostics file.cpp
TranslationUnitDecl 0x745dae8 <<invalid sloc>> <invalid sloc>
|-TypedefDecl 0x745e410 <<invalid sloc>> <invalid sloc> implicit __int128_t '__int128'
| `-BuiltinType 0x745e0b0 '__int128'
|-TypedefDecl 0x745e480 <<invalid sloc>> <invalid sloc> implicit __uint128_t 'unsigned __int128'
| `-BuiltinType 0x745e0d0 'unsigned __int128'
|-TypedefDecl 0x745e7f8 <<invalid sloc>> <invalid sloc> implicit __NSConstantString '__NSConstantString_tag'
| `-RecordType 0x745e570 '__NSConstantString_tag'
|   `-CXXRecord 0x745e4d8 '__NSConstantString_tag'
|-TypedefDecl 0x745e890 <<invalid sloc>> <invalid sloc> implicit __builtin_ms_va_list 'char *'
| `-PointerType 0x745e850 'char *'
|   `-BuiltinType 0x745db90 'char'
|-TypedefDecl 0x74a3268 <<invalid sloc>> <invalid sloc> implicit __builtin_va_list '__va_list_tag [1]'
| `-ConstantArrayType 0x74a3210 '__va_list_tag [1]' 1
|   `-RecordType 0x745e980 '__va_list_tag'
|     `-CXXRecord 0x745e8e8 '__va_list_tag'
`-FunctionDecl 0x74a3308 <file.cpp:1:1, line:5:1> line:1:6 foo 'void ()'
  `-CompoundStmt 0x74a3698 <col:12, line:5:1>
    |-DeclStmt 0x74a34a8 <line:2:3, col:18>
    | `-VarDecl 0x74a3408 <col:3, col:16> col:12 used K 'unsigned int' cinit
    |   `-ImplicitCastExpr 0x74a3490 <col:16> 'unsigned int' <IntegralCast>
    |     `-IntegerLiteral 0x74a3470 <col:16> 'int' 10
    `-ForStmt 0x74a3660 <line:3:3, line:4:5>
      |-DeclStmt 0x74a3560 <line:3:9, col:18>
      | `-VarDecl 0x74a34d8 <col:9, col:17> col:13 used k 'int' cinit
      |   `-IntegerLiteral 0x74a3540 <col:17> 'int' 0
      |-<<<NULL>>>
      |-BinaryOperator 0x74a3600 <col:20, col:24> 'bool' '<'
      | |-ImplicitCastExpr 0x74a35e8 <col:20> 'unsigned int' <IntegralCast>
      | | `-ImplicitCastExpr 0x74a35b8 <col:20> 'int' <LValueToRValue>
      | |   `-DeclRefExpr 0x74a3578 <col:20> 'int' lvalue Var 0x74a34d8 'k' 'int'
      | `-ImplicitCastExpr 0x74a35d0 <col:24> 'unsigned int' <LValueToRValue>
      |   `-DeclRefExpr 0x74a3598 <col:24> 'unsigned int' lvalue Var 0x74a3408 'K' 'unsigned int'
      |-UnaryOperator 0x74a3640 <col:27, col:28> 'int' postfix '++'
      | `-DeclRefExpr 0x74a3620 <col:27> 'int' lvalue Var 0x74a34d8 'k' 'int'
      `-NullStmt 0x74a3658 <line:4:5>

Clearly there's enough context there for this simple case.

You are more than welcome to start hacking here. But you might get more feedback from more experienced Clang developers on cfe-dev. And if people find this feasible and useful, it might be better to contribute this directly to Clang. Again - this could attract some quality reviews. And I'm active on Phabricator too.

@SamuelMarks
Copy link
Contributor Author

Thanks, good idea, I've made a repo and will see how far I can get once the boilerplate is done: https://github.com/SamuelMarks/type-correct

(can use that same boilerplate for the #22 repo… I'm not a fan of monorepos 📦)

@SamuelMarks
Copy link
Contributor Author

@banach-space Trying to be as succinct as possible, with my directory layout, lines of code, and everything. There's a lot of boilerplate to remember / reverse-engineer from your repo!

WiP (contributions welcome)

@banach-space
Copy link
Owner

There's https://github.com/banach-space/clang-tutor/tree/main/HelloWorld as a minimal example. But otherwise, yeah, takes a bit of time to set everything up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants