-
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
7 changed files
with
17,316 additions
and
123 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,85 @@ | ||
# blog generator | ||
# Code-Survey: Uncovering Insights in Complex Systems with LLM | ||
|
||
Make a company and the blog with markdown files. | ||
- **Do we really kno how complex systems like the Linux works?** | ||
- **How can we understand the design choice and evolution of a Super Complex system, like the Linux kernel?** | ||
|
||
**Code-Survey** is here to change that: | ||
|
||
- **No human can do that before, but AI can.** | ||
- **No chatbots, document search, or code generation: everyone is doing so.** | ||
|
||
Code-Survey helps you explore and analyze. the world's largest and most intricate codebases, like the Linux kernel. By carefully **design a survey** and **transforming** `unstructured data` like commits, mailing lists into organized, ``structed and easy-to-analyze data`, Code-Survey makes it simpler to uncover valuable insights in modern complex software systems. | ||
|
||
With the power of AI and Large Language Models (LLMs), you can ask questions, run queries, and gain a deeper understanding of how systems evolve over time. AI Agents can also help you analysis that. Whether you're a developer, researcher, or enthusiast, Code-Survey bridges the gap between design, implementation, and maintenance and security, making complex systems more accessible. | ||
|
||
**Let's do Code-Survey!** | ||
|
||
## Linux-bpf Dataset | ||
|
||
The **Linux-bpf dataset** focuses on the eBPF subsystem and is continuously updated via CI. The dataset includes: | ||
|
||
- **680+ expert-selected commits**: Features, commit details, types (Map, Helper, Kfunc, Prog, etc.). | ||
- **12,000+ BPF-related commits**: LLM Agent surveys and summaries. | ||
- **150,000+ BPF subsystem-related emails**: Automatically analyzed by LLM Agents. | ||
|
||
A simplest approach to see how these data works is just **Upload the CSV to ChatGPT**(**Or other platforms) and Ask questions to let it Analysis for you! | ||
|
||
To see more details, check the analysis in [report_ebpf.md](docs/report_ebpf.md). | ||
|
||
## Workflow / Methodology | ||
|
||
|
||
|
||
Our approach follows a well-defined workflow: | ||
|
||
1. **Human Experts or LLM Agents design surveys**: Tailored questions for each commit or email. | ||
2. **LLM Agents complete the surveys**: Answering yes/no, tagging relevant data, and summarizing key information. **This is the key steps to turn unstructured data into structured data.** | ||
3. **Human Experts or LLM Agents evaluate results**: Ensuring accuracy and uncover new insights easily from that. | ||
|
||
### Best Practices for Designing Surveys: | ||
|
||
- Survey acts as both prompt and action plan for LLM Agents. | ||
- Focus on yes/no, choice-based, or summary questions. | ||
- Domain knowledge questions should remain with experts. | ||
|
||
## Why LLM? | ||
|
||
LLMs have been proven effective in survey, summarization, and analysis tasks in fields like market research and chemistry. With LLMs, we can analyze unstructured data, which traditional methods struggle to handle efficiently. | ||
|
||
### Why Not Other Methods? | ||
|
||
- **Domain Knowledge**: Required for Linux kernel analysis. | ||
- **Unstructured Data**: Commit messages and emails are difficult to process with traditional tools. | ||
- **Expert Cost**: Manually analyzing this data is time-consuming and expensive. | ||
|
||
## Example Questions LLM Agents Can Answer: | ||
|
||
- How do new feature introductions affect kernel stability and performance? | ||
- What identifiable phases exist in a feature’s lifecycle? | ||
- How has a specific eBPF feature, like `bpf_link`, evolved over successive commits? | ||
- What patterns emerge in commit frequency related to specific features? | ||
- What lessons from eBPF development can improve other eBPF runtimes? | ||
|
||
## Configuration Example | ||
|
||
```yml | ||
# Configuration for LLM Agent in Code-survey | ||
task: survey_analysis | ||
memory_access: linux_bpf_database | ||
survey_questions: | ||
- type: yes_no | ||
content: "Was this commit related to bpf_link?" | ||
- type: tag | ||
options: ["uprobe", "kprobe", "xdp", "bpf_link"] | ||
content: "What type of BPF feature is this?" | ||
- type: summary | ||
length: 1-2 sentences | ||
content: "Summarize the main purpose of this commit." | ||
``` | ||
## References | ||
1. [How to Communicate When Submitting Patches: An Empirical Study of the Linux Kernel](https://dl.acm.org/doi/abs/10.1145/3359210) | ||
2. [Differentiating Communication Styles of Leaders on the Linux Kernel Mailing List](https://dl.acm.org/doi/abs/10.1145/2957792) | ||
This README outlines how Code-survey uses LLM agents to transform unstructured Linux kernel data into actionable insights, particularly in the eBPF subsystem, providing a faster and deeper understanding of feature evolution, design, and collaboration. |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,22 +1,5 @@ | ||
# How to define a LLM Agent | ||
|
||
- Task: | ||
- Tool: | ||
- The input survey define: 3 types of questions | ||
- Answer: | ||
- if yes | ||
|
||
- The tag: choose between usecases: security/network/cgroup/observability... | ||
- if security | ||
- The summary: should be complete in one or 2 sentence. | ||
- Key words: | ||
- number 1-10 | ||
- Memory: which database can it access? | ||
- Planer(Predefined) | ||
|
||
## Config | ||
# How to define a survey | ||
|
||
```yml | ||
|
||
|
||
``` |
Oops, something went wrong.