-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhanced Troubleshooting Guides #1884
Draft
gerth2
wants to merge
10
commits into
wpilibsuite:main
Choose a base branch
from
gerth2:gerth2_troubleshooting
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
7f2aeea
Add common field problems article
Daltz333 893735d
add to root toc
Daltz333 9051fdc
Fix trailing whitespace
Daltz333 bcb78cb
Add additional information
Daltz333 af9945e
troubleshooting skeleton
gerth2 f759045
more wip troubleshooting
gerth2 dd712ee
Merge remote-tracking branch 'Daltz333/common-field-problems' into ge…
gerth2 8b95e33
Merge Cleanup
gerth2 56acbe7
more wip
gerth2 29c5b83
more WIP on troubleshooting docs
gerth2 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
Debugging CAN-Related Problems | ||
============================== | ||
|
||
Usual symptoms | ||
|
||
wiring | ||
|
||
termination | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
Debugging Issues while Building Code | ||
==================================== | ||
|
||
Common Symptoms | ||
--------------- | ||
|
||
|
||
gradlew is not recognized... | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
``gradlew is not recognized as an internal or external command`` is a common error that can occur when the project or directory that you are currently in does not contain a ``gradlew`` file. This usually occurs when you open the wrong directory. | ||
|
||
.. image:: images/reading-stacktraces/bad-gradlew-project.png | ||
:alt: Image containing that the left-hand VS Code sidebar does not contain gradlew | ||
|
||
In the above screenshot, you can see that the left-hand sidebar does not contain many files. At a minimum, VS Code needs a couple of files to properly build and deploy your project. | ||
|
||
- ``gradlew`` | ||
- ``build.gradle`` | ||
- ``gradlew.bat`` | ||
|
||
If you do not see any one of the above files in your project directory, then you have two possible causes. | ||
|
||
- A corrupt or bad project. | ||
- You are in the wrong directory. | ||
|
||
Fixing gradlew is not recognized... | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
``gradlew is not recognized...`` is a fairly easy problem to fix. First identify the problem source: | ||
|
||
**Are you in the wrong directory?** | ||
- Verify that the project directory is the correct directory and open this. | ||
|
||
**Is your project missing essential files?** | ||
- This issue is more complex to solve. The recommended solution is to :ref:`recreate your project <docs/software/vscode-overview/creating-robot-program:Creating a Robot Program>` and manually copy necessary code in. | ||
|
||
|
||
Driving toward Root Cause | ||
------------------------- |
52 changes: 52 additions & 0 deletions
52
source/docs/software/troubleshooting/common-field-problems.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
Common Field Problems | ||
===================== | ||
|
||
This article details some of the common problems that can plague your robot when it's on the field. It can be extremely frustrating and stressful when your robot breaks down. This article hopes to inform and instruct on what you can do to find the problem, and it's resolution. | ||
|
||
.. important:: Remember to never eliminate any possibility! It never hurts to double or even triple check that everything is working properly. | ||
|
||
Robot is stuttering and the RSL lights are dimming | ||
-------------------------------------------------- | ||
|
||
Whenever your robot seems to give jerking motions and the RSL lights are dimming, this is usually a sign of :doc:`brownouts </docs/software/roborio-info/roborio-brownouts>`. One of the first steps you can take to resolving a brownout is identify when it occurred and any notable correlating events. Did you go into a match with your battery too low? Are you drawing too much current somehow? Can you reproduce this in the pit? | ||
|
||
One of the most useful tools for identifying brownout causes is the :doc:`driver station log viewer </docs/software/driverstation/driver-station-log-viewer>`. | ||
|
||
.. image:: /docs/software/roborio-info/images/identifying-brownouts.png | ||
|
||
In the above image, you can see the brownout indicated by the highlighted orange line. The orange line represents dips (or lack of a straight line) in robot voltage. | ||
|
||
Joystick inputs seem to be dropping | ||
----------------------------------- | ||
|
||
One of the characteristics of lost joystick inputs is when you press buttons or an axis and nothing happens! This can happen from a variety of reasons, so it's important to analyze which one is likely to your situation. | ||
|
||
.. todo:: looking at the driverstation log and identifying if lost joysticks is a code related .. error:: text | ||
|
||
.. important:: There is a current :ref:`known issue <docs/yearly-overview/known-issues:onboard i2c causing system lockups>` where I2C reads can take a long time or lock up the roboRIO. | ||
|
||
Let's begin by asking a question. Can you reliably reproduce this issue at home or in the pits? This step is critical and assumptions *must not* be made. | ||
|
||
Yes, I can | ||
^^^^^^^^^^ | ||
|
||
This eliminates bandwidth or connectivity issues to the FMS. Some areas to explore are: | ||
|
||
- Are joysticks working properly? | ||
- Sometimes the issue can be as simple as a flakey USB cable or joystick. | ||
|
||
- Is the computer running slow or sluggish? Try restarting | ||
- High CPU or Disk Utilization can be indicators the Driver Station itself is sending inputs late. | ||
|
||
- Is the code doing any long computation or loops? (Misuse of `for` and `while` loops can be common problems) | ||
- In most cases, the use of any loops in FRC robot code can be avoided except in rare circumstances. | ||
|
||
No, I cannot | ||
^^^^^^^^^^^^ | ||
|
||
This is likely a bandwidth or IP configuration issue. Try setting your IP configurations to :ref:`DHCP <docs/networking/networking-introduction/ip-configurations:in the pits dhcp configuration>` or :ref:`Static <docs/networking/networking-introduction/ip-configurations:in the pits static configuration>`. Another potential problem could be excessive bandwidth utilization. Try :ref:`measuring your bandwidth utilization <docs/networking/networking-introduction/measuring-bandwidth-usage:viewing bandwidth usage>`. | ||
|
||
Unable to connect to your robot? | ||
-------------------------------- | ||
|
||
See :ref:`docs/software/troubleshooting/networking:Usual Symptoms` |
File renamed without changes.
112 changes: 112 additions & 0 deletions
112
source/docs/software/troubleshooting/gathering-information.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
Gathering Debug Information | ||
=========================== | ||
|
||
During the cycle of troubleshooting, a key step is to gather data. A large amount of the behavior of a robot's control system is *hidden* from view, and requires special tools to observe. While not exhaustive, the following is a list of common tools that robot software developers should be familiar with. | ||
|
||
Driver Station | ||
-------------- | ||
|
||
The National Instruments DriverStation is the first place to check when robot does not behave as expected. | ||
|
||
In particular, the :ref:`Diagnostics Tab <docs/software/driverstation/driver-station:Diagnostics Tab>` and :ref:`Messages Tab <docs/software/driverstation/driver-station:Messages Tab>` frequently contain the minimum info needed to start driving toward root cause on a problem. | ||
|
||
Additionally, the :ref:`Log File Viewer <docs/software/driverstation/driver-station-log-viewer:Driver Station Log File Viewer` provides more-detailed timeseries graphs of key data values and message logs. | ||
|
||
rioLog | ||
------ | ||
|
||
rioLog is a utility built into the WPILib suite and vsCode. It allows you to remotely view all of the :code:`stdout` and :code:`stderr` messages from your robot program. This includes all warnings, error messages, and print statements that your robot program generates. You can write your own software to generate these messages, as well as read the messages produced by WPILib or a 3rd party. | ||
|
||
See :ref:`Riolog VS Code Plugin <docs/software/vscode-overview/viewing-console-output:Riolog VS Code Plugin>` for more info. | ||
|
||
Command Line Utilities | ||
---------------------- | ||
|
||
The Windows command prompt has a number of useful tools for troubleshooting. | ||
|
||
The Windows command prompt may be accessed from the start menu. It is named :code:`cmd.exe`. The commands we describe here should be typed into the command prompt. | ||
|
||
Using `ping` | ||
^^^^^^^^^^^^ | ||
|
||
:code:`ping` is a utility built into Windows which allows for a basic network connection check between two points. It confirms basic functionality of both the physical layer (wiring or wireless), and a small portion of software. | ||
|
||
It can be invoked by typing :code:`ping`, followed by a space, followed by the IP address to be checked, followed by Enter. For example, checking the IP address :code:`10.12.34.1`: | ||
|
||
.. code-block:: console | ||
|
||
C:\Users\YOUR_USER>ping 10.12.34.1 | ||
|
||
Pinging 10.12.34.1 with 32 bytes of data: | ||
Reply from 10.12.34.1: bytes=32 time=3ms TTL=128 | ||
Reply from 10.12.34.1: bytes=32 time=3ms TTL=128 | ||
Reply from 10.12.34.1: bytes=32 time=3ms TTL=128 | ||
Reply from 10.12.34.1: bytes=32 time=3ms TTL=128 | ||
|
||
Ping statistics for 10.12.34.1: | ||
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss), | ||
Approximate round trip times in milli-seconds: | ||
Minimum = 3ms, Maximum = 3ms, Average = 3ms | ||
|
||
This shows four test "pings" being sent, and the device with IP address :code:`10.12.34.1` responding with a "Yup, I hear ya!" message within three milliseconds. | ||
|
||
If None of the pings are responded to, it would likely indicate some total failure which prevents communication - perhaps a cable is unplugged, or the device is turned off, or doesn't have the expected IP address. | ||
|
||
If only some of the packets come back, it would indicate a partial failure preventing some communication. Perhaps a cable is loose, the wifi network is being rate limited or interfered with. | ||
|
||
Using :code:`ipconfig` | ||
^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
:code:`ipconfig` is a utility built into Windows which summarizes the configuration of the network interfaces on the device. It can help confirm your computer is actually attached to a robot network, and should be capable of communicating with robot components. | ||
|
||
It is invoked simply by typing :code:`ipconfig` and hitting Enter. | ||
|
||
Here is an example of running it on a computer with one wireless (wifi) network interface and one wired (ethernet) interface, but with neither connected. | ||
|
||
.. code-block:: console | ||
|
||
C:\Users\YOUR_USER>ipconfig | ||
|
||
Windows IP Configuration | ||
|
||
|
||
Wireless LAN adapter Local Area Connection* 1: | ||
|
||
Media State . . . . . . . . . . . : Media disconnected | ||
Connection-specific DNS Suffix . : | ||
|
||
Wireless LAN adapter Wi-Fi: | ||
|
||
Media State . . . . . . . . . . . : Media disconnected | ||
Connection-specific DNS Suffix . : | ||
|
||
Here is another example with the wifi network properly connected to team 1234's robot over wifi: | ||
|
||
.. code-block:: console | ||
|
||
C:\Users\YOUR_USER>ipconfig | ||
|
||
Windows IP Configuration | ||
|
||
|
||
Wireless LAN adapter Wi-Fi: | ||
|
||
Connection-specific DNS Suffix . : localdomain | ||
Link-local IPv6 Address . . . . . : fe80::890d:bbae:d81c:d416%7 | ||
IPv4 Address. . . . . . . . . . . : 10.12.34.210 | ||
Subnet Mask . . . . . . . . . . . : 255.255.255.0 | ||
Default Gateway . . . . . . . . . : 10.12.34.1 | ||
|
||
|
||
Manufacturer-Specific Interfaces | ||
-------------------------------- | ||
|
||
3rd party manufacturers support custom interfaces to help address problems that are specific to their hardware. These include: | ||
|
||
* `REV Robotics Hardware Client <https://docs.revrobotics.com/rev-hardware-client/>`__ | ||
* `Cross the Road Electronics Pheonix Framework <https://docs.ctre-phoenix.com/en/stable/ch05_PrepWorkstation.html>`__ | ||
* `Playing with Fusion's Web-Based Configuration <https://www.youtube.com/watch?v=LMuq73Vojw8&t=336s>`__ | ||
|
||
REV Robotics, Cross the Road Electronics, and Playing with Fusion all supply additional utilities for configuring and troubleshooting their hardware. | ||
|
||
|
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
Troubleshooting | ||
=============== | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
|
||
introduction.rst | ||
gathering-information.rst | ||
code-build.rst | ||
driver-station-errors-warnings.rst | ||
common-field-problems.rst | ||
reading-stacktraces.rst | ||
using-test-mode.rst | ||
loop-overruns.rst | ||
can-bus.rst | ||
networking.rst |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
Introduction to Troubleshooting | ||
=============================== | ||
|
||
*Troubleshooting* is the art of identifying the causes of problems, and using the cause to iterate a better solution. | ||
|
||
Issues Will Happen | ||
------------------ | ||
|
||
Every robot will experience problems. These can be frustrating! Rest assured, fixing these issues is something every team goes through in a season. | ||
|
||
This section of the docs is designed to help teams identify and fix common robot issues which have control-system root causes. While not an exhaustive list of all possible issues, the hope is to provide general guidance and specific examples to reduce the most common pain points. | ||
|
||
Symptom vs. Root Cause | ||
---------------------- | ||
|
||
When troubleshooting, be sure to separate *Symptom* and *Root Cause*. | ||
|
||
The *Symptom* is the behavior you actually observe, which is not correct. For example, a robot which can only turn in place (and cannot drive straight) is a *symptom* a team might observe. | ||
|
||
The *Root Cause* is the incorrect software, electrical hookup, or mechanical fault which actually caused the symptom to occur. | ||
|
||
When troubleshooting effectively, a team will work backward from the observed symptom, to the root cause. Ideally, the root cause gets fixed, and in turn the symptom stops manifesting. | ||
|
||
Sometimes, resource constraints might make a team "patch over" a symptom without identifying or fixing root cause. Teams should tread cautiously here, as patches are prone to break or cause more issues later on. | ||
|
||
On Working Methodically | ||
----------------------- | ||
|
||
Effective troubleshooting requires teams to work methodically. | ||
|
||
The Scientific Process, in Real-Time | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
The core of all troubleshooting strategies is the same as the scientific process. Namely: | ||
|
||
. Observe the world around you | ||
. Propose a hypothesis | ||
. Design and execute a test of that hypothesis | ||
. Observe and interpret the results | ||
. Repeat | ||
|
||
In the case of most FRC robot troubleshooting, the hypothesis will be relatively small. A valid hypothesis could simply be "If I add a `* -1` to line 354 of my code, it should fix the motor that's running backward". The experiment would then be to make the change, upload the code, and attempt to reproduce the backward motor issue. If the motor is now running the correct direction, it is reasonable to assume the hypothesis was correct, and no further action is needed. However, if the issue persists, one could assume the hypothesis was not entirely correct, and the process must be repeated with a new hypothesis. | ||
|
||
Change One Variable at a Time | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It may also be useful here to say that if a troubleshooting step did not change the observed symptom, it should be undone. |
||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
When interpreting the results of an experiment, it is critical that the experiment has controlled for all but one variable. Having only one changing variable is what allows experiment results to be interpreted to a single root cause. | ||
|
||
If many variables change, and the problem goes away, you will not know which variable actually fixed the root cause. | ||
|
||
While it may be tempting to change a lot of things hoping one of them fixes the issue, this will likely lead to a lot of things changed unnecessarily. | ||
|
||
Undirected Guess and Check is Ineffective | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
A naïve approach to troubleshooting will start by assuming *anything* could be the root cause, and pursue each option one by one. However, in a large and complex system (like a robot), the number of possibilities can be too large to effectively test each one, one by one. | ||
|
||
From this perspective, it is best to start by making a few assumptions about what root causes are *most likely*, and test those first. As you get more experience doing troubleshooting, you'll gain a better intuition for where to start looking for problems. | ||
|
||
However, keep in mind that these are *assumptions*. They're educated guesses as to where the problem *likely is not*, not exhaustive proof that a problem doesn't exist. Always be ready to go back and undo your assumptions if needed. | ||
|
||
Be Egoless | ||
^^^^^^^^^^ | ||
|
||
When troubleshooting, emotions can often start flying, as it sometimes appears blame is being placed. People can get defensive when their component or their design is called into question. | ||
|
||
It's important to keep in mind that everyone is on the same team, working toward the same goal. Be careful to choose words and descriptions which describe and judge ideas, not people. Furthermore, try not to let your own ego and biases get in the way of considering possible faults in the systems you are responsible for. | ||
|
||
Single vs. Multiple Points of Failure | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
The most effective way to troubleshoot is to start by assuming that a *single point of failure* has triggered the symptom. For well designed, simple systems, this is usually the case. | ||
|
||
However, as systems get larger and more complex, it's very possible multiple failures might exist. While this should always be a *secondary* assumption, be careful not to ignore the fact a symptom may be caused by multiple, interacting failures. | ||
|
||
Practice, Practice, Practice | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
Troubleshooting is a learned skill. While there are few concrete facts and figures to memorize, seeing examples of failures and their root causes over and over again is the best way to get better at isolating root causes from symptoms. | ||
|
||
One will often see more experienced mentors or students look at an issue and quickly state a root cause. And, often, they'll be correct. Rest assured, this ability isn't magical or genetic - it's learned. Folks who are good at troubleshooting will *still* go through all the steps and processes these docs describe. However, they draw from a broader set of exposure to recognize patterns faster, and eliminate unlikely possibilities. | ||
|
||
Be intentional about spending time practicing troubleshooting, and try not to worry if it takes longer than others. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably worth mentioning keeping a written record of the steps that were taken (why they were taken), and what the observations were.