Rendered at 23:49:17 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
genthree 7 hours ago [-]
Relatedly: Has anyone profiled the performance and reliability characteristics of rsyslogd (Linux and FreeBSD distributed syslogger, maybe other platforms too) in its mode where it’s shipping logs to a central node? I’ve configured and used it with relatively small (high single digit nodes, bursts of activity to a million or two requests per minute or so) set-ups but have wondered if there’s a reason it’s not a more common solution for distributed logging and tracing (yes it doesn’t solve the UI problem for those, but it does solve collecting your logs)
Like… has anyone done a Jepsen-like stress test on rsyslogd and shared the results? I’ve half-assedly looked before and not been able to find anything.
ehostunreach 32 minutes ago [-]
Since this is an OTel-related submission, you could also use OTel collectors to collect and forward logs to a central OTel collector instance.
> yes it doesn’t solve the UI problem for those, but it does solve collecting your logs
I work for Netdata and over the last couple months, we've developed an external Netdata plugin that can ingest/index OTel logs [1]. The current implementation stores logs in systemd-compatible journal files and our visualization is effectively the same one someone would get when querying systemd journal logs [2].
i
> Like… has anyone done a Jepsen-like stress test on rsyslogd and shared the results? I’ve half-assedly looked before and not been able to find anything.
I've not used rsyslogd specifically, but I don't see how you'd have any issues with the log volume you described.
We're doing this with a few dozen GiBs of logs a day (rsylog -> central rsylog -> elasticsearch). It works reliably, but the config is an absolute nightmare, documentation is a mixed bag and troubleshooting often involves deep dives into the C code. We're planning to migrate to Alloy+Loki.
nesarkvechnep 5 hours ago [-]
People don’t care about syslog. 98% of my colleagues haven’t heard of it.
malux85 4 hours ago [-]
You are drawing a global conclusion from a tiny sample!
SEJeff 4 hours ago [-]
I wonder how this compares to grafana pyroscope, which is really good for this sort of thing and already quite mature:
As far as I'm aware, Pyroscope itself is not a profiler, but a place you can send/query profiles. OpenTelemtry is releasing a profiler, so they don't compare. One can be used with the other.
sciurus 3 hours ago [-]
You can send profiles collected by opentelemetry to pyroscope.
Very excited for this. We've used the Elixir version of this at $WORK a handful of times and have found it exceptionally useful.
secondcoming 7 hours ago [-]
> Continuously capturing low-overhead performance profiles in production
It suprises me that anything designed by the OTel community could ever meet 'low-overhead' expectations.
tanelpoder 6 hours ago [-]
The reference implementation of the profiler [1] was originally built by the Optimyze team that Elastic then acquired (and donated to OTEL). That team is very good at what they do. For example, they invented the .eh_frame walking technique to get stack traces from binaries without frame pointers enabled.
Some of the OGs from that team later founded Zymtrace [2] and they're doing the same for profiling what happens inside GPUs now!
> For example, they invented the .eh_frame walking technique to get stack traces from binaries without frame pointers enabled.
This is not an accurate summary of what they developed.
Using .eh_frame to unwind stacks without frame pointers is not novel - it is exactly what it is for and perf has had an implementation doing it since ~2010. The problem is the kernel support for this was repeatedly rejected so the kernel samples kilobytes of stack and then userspace does the unwind
What they developed is an implementation of unwinding from an eBPF program running in the kernel using data from eh_frame.
tanelpoder 2 hours ago [-]
True, I should have been more specific about the context:
Their invention is about pushing down the .eh_frame walking to kernel space, so you don't need to ship large chunks of stack memory to userspace for post-processing. And eBPF code is the executor of that "pushed down" .eh_frame walking.
OTel Profiling SIG maintainer here: I understand your concern, but we’ve tried our best to make things efficient across the protocol and all
involved components.
Please let us know if you find any issues with what we are shipping right now.
Like… has anyone done a Jepsen-like stress test on rsyslogd and shared the results? I’ve half-assedly looked before and not been able to find anything.
> yes it doesn’t solve the UI problem for those, but it does solve collecting your logs
I work for Netdata and over the last couple months, we've developed an external Netdata plugin that can ingest/index OTel logs [1]. The current implementation stores logs in systemd-compatible journal files and our visualization is effectively the same one someone would get when querying systemd journal logs [2]. i > Like… has anyone done a Jepsen-like stress test on rsyslogd and shared the results? I’ve half-assedly looked before and not been able to find anything.
I've not used rsyslogd specifically, but I don't see how you'd have any issues with the log volume you described.
[1] https://github.com/netdata/netdata/tree/master/src/crates/ne...
[2] https://learn.netdata.cloud/docs/logs/systemd-journal-logs/s...
https://grafana.com/oss/pyroscope/
https://github.com/grafana/pyroscope
https://grafana.com/docs/pyroscope/latest/configure-client/o...
It suprises me that anything designed by the OTel community could ever meet 'low-overhead' expectations.
Some of the OGs from that team later founded Zymtrace [2] and they're doing the same for profiling what happens inside GPUs now!
[1] https://github.com/open-telemetry/opentelemetry-ebpf-profile...
[2] https://zymtrace.com/article/zero-friction-gpu-profiler/
This is not an accurate summary of what they developed.
Using .eh_frame to unwind stacks without frame pointers is not novel - it is exactly what it is for and perf has had an implementation doing it since ~2010. The problem is the kernel support for this was repeatedly rejected so the kernel samples kilobytes of stack and then userspace does the unwind
What they developed is an implementation of unwinding from an eBPF program running in the kernel using data from eh_frame.
Their invention is about pushing down the .eh_frame walking to kernel space, so you don't need to ship large chunks of stack memory to userspace for post-processing. And eBPF code is the executor of that "pushed down" .eh_frame walking.
The GitHub page mentions a patent on this too: https://patents.google.com/patent/US11604718B1/en
Please let us know if you find any issues with what we are shipping right now.