Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

socket.error: [Errno 32] Broken pipe #2

Open
mitalhp opened this issue Sep 23, 2014 · 12 comments
Open

socket.error: [Errno 32] Broken pipe #2

mitalhp opened this issue Sep 23, 2014 · 12 comments

Comments

@mitalhp
Copy link

mitalhp commented Sep 23, 2014

I was getting a broken pipe error, i believe caused by socket size limits when sending to graphite, so I changed the send_to_graphite to chunk up the data which seems to have fixed the issue. Not sure if this is the best way to handle it (it doesn't work with never versions of the script since the threading was added).

def chunks(data, size):
    for i in xrange(0, len(data), size):
        yield data[i:i+size]

def send_to_graphite(metrics, chunksize=500):
    if args.debug:
        for m, mval  in metrics:
            log('%s %s = %s' % (mval[0], m, mval[1]), True)
    else:
        if chunksize:
            chunked_metrics = list(chunks(metrics, chunksize))
        else:
            chunked_metrics = list(metrics)

        log('total %s chunks of %s size' % (len(chunked_metrics), chunksize))
        for c in chunked_metrics:
                log('sending chunk')
                payload = pickle.dumps(c)
                header = struct.pack('!L', len(payload))
                sock = socket.socket()
                sock.connect((args.graphite_host, args.graphite_port))
                sock.sendall('%s%s' % (header, payload))
                sock.close()
@apple-corps
Copy link

I just saw the same myself with the latest version.

2015-07-07 16:03:09,444 [MainThread es2graphite.py :submi:174] [ERROR ] Communication to Graphite server failed: [Errno 32] Broken pipe

@apple-corps
Copy link

What's with the debug messages being url encoded anyhow?

2015-07-07 16:06:09,224 [MainThread es2graphite.py :submi:175] [DEBUG ] Traceback+%28most+recent+call+last%29%3A%0A++File+%22.%2Fes2graphite.py%22%2C+line+172%2C+in+submit_to_graphite%0A++++graphite_socket%5B%27socket%27%5D.sendall%28+%22%25s%25s%22+%25+%28header%2C+payload%29+%29%0A++File+%22%2Fusr%2Flib%2Fpython2.7%2Fsocket.py%22%2C+line+228%2C+in+meth%0A++++return+getattr%28self._sock%2Cname%29%28%2Aargs%29%0Aerror%3A+%5BErrno+32%5D+Broken+pipe%0A

@Ralnoc
Copy link
Contributor

Ralnoc commented Jul 8, 2015

I'll look into this. I have yet to experience the issue myself.

As to the urlencoding. I added that for the traceback output so that those messages can be sent through your standard syslog application that would normally break up multi-line outputs into multiple messages. This ensures that the whole message reaches the remote destinatioin a usable form.

@apple-corps
Copy link

@Ralnoc I think you probably haven't experienced the issue because you don't have enough elasticsearch content that you need to chunk it. Not sure why @mitalhp 's chunking will not work with threading.

@Ralnoc
Copy link
Contributor

Ralnoc commented Jul 9, 2015

@Drocsid Could you detail the exact arguments you are using? What health-level? Are you using shard-stats, etc? I need to try and replicate the issue.

@apple-corps
Copy link

python2 ./es2graphite.py --stdout --log-level debug es.server:9200 -g graphite.server -o 2004 .

I also needed to comment out some lines to get the stats into my graphite dashboard. I'm also curious about the round-robin approach.It appears that all the _GET requests use the same elasticsearch host.

@Ralnoc
Copy link
Contributor

Ralnoc commented Jul 10, 2015

What sections did you comment out? Also, I don't follow the question about round robin. They code is always querying the same host, each _get request is for different stats URIs.

@apple-corps
Copy link

There was an stack trace like:

 2015-07-02 15:27:13,240 [MainThread] [ERROR   ] 
     Traceback+%28most+recent+call+last%29%3A%0A++File+%22.%2Fes2graphite.py
    %22%2C+line+290%2C+in+%3Cmodule%3E%0A++++get_metrics%28%29%0A++File+%22.
    %2Fes2graphite.py%22%2C+line+240%2C+in+get_metrics%0A++++indices_stats_m
    etrics+%3D+process_indices_stats%28args.prefix%2C+indices_stats%29%0A++F
    ile+%22.%2Fes2graphite.py%22%2C+line+121%2C+in+process_indices_stats%0A+
    +++process_section%28int%28time.time%28%29%29%2C+metrics%2C+%28prefix%2C
    +CLUSTER_NAME%2C+%27indices%27%29%2C+stats%5B%27indices%27%5D%29%0ATypeE
    rror%3A+process_section%28%29+takes+exactly+5+arguments+%284+given%29%0A
    2015-07-02 15:27:13,241 [MainThread] [INFO    ]  2015-07-02 15:27:13:
    GET        

so I had a look at this and it looks like the issue was coming from https://github.com/mattweber/es2graphite/blob/master/es2graphite.py#L119

So I commented the related lines...

@Ralnoc
Copy link
Contributor

Ralnoc commented Jul 11, 2015

Ok. It looks like something is going on in the indices level stat gathering. If you uncomment that section of code and just set --health-level cluster then you can run it and have it bypass that code. I'll have to run some tests and see why that issue is manifesting.

@Ralnoc
Copy link
Contributor

Ralnoc commented Aug 25, 2015

@Drocsid - The issue you were experiencing is different that the one described by @mitalhp . You issue ended up being an issue where the index collection call to process_section wasn't updated with the new format. That issue is moved to #14 . I'm continuing to investigate the broken pipe issue, but I have yet to run into it.

@apple-corps
Copy link

@Ralnoc

The broken pipe is likely due to having a large number of indices and stats from the cluster. I re-used and modified some of the functions from these scripts, but hacked it heavily to create a custom tailored graphite dashboard. I had an interest in different metrics, but this served as a good quickstart entrypoint for me. Unfortunately, I don't think my hacks are polished enough but I might think about checking it in if there's any interest. Thanks.

@AlexClineBB
Copy link

I can confirm that the broken pipe issue is caused by a large number of indices and stats. To mitigate the issue, I modified the stats URL (L268) to request only the stats that I needed. This reduced the size of the json object and fixed the timeout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants