Apache Helix – Near-Realtime Rsync Replicated File System

bigbubba · on Oct 26, 2020

How is this using rsync? Does it invoke an rsync process, or implement the rsync wire protocol? One of my gripes with rsync is I've not found another way to integrate it into other systems. librsync doesn't implement the wire protocol and parsing the output of rsync itself is fraught with peril (for instance the formatting of --no-h has changed before.)

Edit: it seems to invoke an a rsync process but doesn't parse stdout for progress. A bit disappointing but I suppose that's all which is needed for this application.

pengaru · on Oct 26, 2020

librsync has absolutely no connection to the rsync utility, it's a distinct implementation of the underlying differential algorithm. The rsync utility doesn't use it, and they don't even share a common data format at any level.

bigbubba · on Oct 26, 2020

Yes I'm aware, but usually when I voice my gripes with integrating rsync somebody unfamiliar with the matter quickly googles 'rsync library', finds librsync, and assumes it must be the answer to my problem. Then I have to explain to them that it's not. I thought to head that off this time by mentioning upfront that librsync is not the answer.

pengaru · on Oct 26, 2020

I see, yeah, librsync's misleading name has clearly caused a lot of wasted time.

techcode · on Oct 27, 2020

Just to be sure I'm getting the question - you want to use rsync(1), and you're wondering if there's a good way to guard your code that's using/calling it from underlying changes in rsync options and output, how to read status/progress and such?

Here's an example of library that wraps rsync https://metacpan.org/pod/File::Rsync - you can provide callback functions that will be called for each line of stdout/stderr.

Granted it's in Perl - and chances are your code isn't written in Perl. So if you can't find something similar for your language of choice, and if you can't be bothered with checking out implementation in Perl and rewriting it...

I would suggest searching & checking out "Inline::" namespace (e.g.: https://metacpan.org/pod/distribution/Inline-Java/lib/Inline..., and there's also JS, C ...etc).

And next to those - Perlito can "Compile Perl to Java/JavaScript/Python/Ruby/Go/etc" https://github.com/fglock/Perlito. I think we had/have some Perl code running in production Hadoop/JVM.

danmur · on Oct 27, 2020

No way to win :)

rakoo · on Oct 27, 2020

Indeed there is no widespread documentation on how to play with rsync itself. A long time ago I played with the idea of replicating rproxy (https://rproxy.samba.org/), which was just a cool idea. The results are here (https://rproxy.samba.org/) but are very very basic, it was just a toy. I did have to invent my own protocol, but I did it over HTTP.

strbean · on Oct 26, 2020

rclone might hold some promise as an alternative, if you don't mind using Go.

ysleepy · on Oct 26, 2020

This is a cluster mgmt system that also has a file sync feature built in, probably to deploy binaries and data to run jobs.

https://helix.apache.org

The link here is to the docs of version 0.6.8 while 1.0.1 is current (after 0.7, 0.8 and 0.9).

This also explains the heavy weight architecture, you probably are running a ZooKeeper anyway in that environment.

pengaru · on Oct 26, 2020

A major problem with rsync is it relies on filenames matching at the source and destination for an accelerated delta transfer to occur.

If a file gets renamed or copied, modified or not, rsync will transfer the whole file.

They added --fuzzy to try improve on this situation, but it generally only helps with unmodified files copied/renamed within the same directory.

You can build robust solutions around rsync that work efficiently most of the time, but pathological cases such as users having a huge file regularly regenerated using something like a $(date --iso-8601=seconds) filename (think mysqldump backups) are surprisingly common once you have enough users. Even if you can convince them to uncompress such files for delta transfer sake, something as common as a versioned filename prevents rsync from automatically finding the relationship for use in the delta transfer.

BatteryMountain · on Oct 27, 2020

So we need a new tool that is basically git and rsync rolled into one. Might be easier to use something like zfs and use it's features to build a tool that behaves almost like git. Then a (zfs + rsync + [zfs-git hybrid]) monster is born, which may or may not work.

Fyi have a look at zfs. It has some neat traits/behaviours. So my thinking is, someone should write a database that uses gazzilions of files (instead of a file per table), a tiny file for each cell/field in a table. Then use zfs for versioning of data (or git). Zfs can already do software mirroring to drive pools on the machine, so maybe add rsync for remote pools. Optimize it for SSD's and you have a filesystem-database-replicating-versioned -based store.

Maybe just need a nice cli/gui for zfs to make it easier/safer to expose more of it's interesting features.

zcw100 · on Oct 27, 2020

“So we need a new tool that is basically git and rsync rolled into one.”

That sounds a bit like IPFS

catmanjan · on Oct 27, 2020

How else could they do it? All systems I've seen that have relied on files on a file system use the file name as the "primary key"

I believe you can use journalling to detect file name change events but I don't think it scales well

pengaru · on Oct 27, 2020

Who is "they"? Are you asking about rysnc developers or filesystem developers?

For a filesystem the developers are free to innovate on any form of filesystem-wide block checksumming and content-addressed deduplication approach. This kind of global block-level content awareness is entirely decoupled from filenames and paths.

Using rsync to underpin a filesystem strikes me as a gross hack done by developers punting on the real challenges of modern filesystem development.

WRT rsync doing better than --fuzzy, the options are more limited since it's a userspace tool working within the confines of filesystem APIs like POSIX.

catmanjan · on Oct 27, 2020

So basically there is no good solution at the moment, and a good solution would probably involve enhancing the underlying filesystem - this was the same conclusion I came to

ComodoHacker · on Oct 27, 2020

For example, zpaq splits files in chunks and hashes them, so it can deduplicate across files.

jiggawatts · on Oct 27, 2020

Windows Server has a variant of the Rsync algorithm that it uses for its Distributed File System Replication (DFS-R) that can look for up to 4 similar files. It works okay-ish in practice, and is better than nothing.

traceroute66 · on Oct 26, 2020

Minor rant: The trouble with all these lovely Apache projects is their dependency on Java.

If these things were written in C, Go or Rust then I'd leap at the opportunity to explore them and maybe use them in production.

But Java dependency just brings so much baggage with it. Its all well and good if you've got a Java pro or two on your team. But if you don't, you usually end up spending hours and days troubleshooting obscure JVM problems or somesuch.

UMetaGOMS · on Oct 26, 2020

Cannot upvote this enough. I'm sure a frequent Java user will come along to tell me how easy it is - but at this point I'm not interested in delving into their world. If a new application I'm looking at is a single binary I can drop on my system (a la Go/Rust) then great - otherwise it has to be something I really need for me to not just resort to "whatever is in the package manager, or find a different solution".

francislavoie · on Oct 27, 2020

I avoid ElasticSearch for that reason.

avinassh · on Oct 27, 2020

Which alternative you use/prefer instead?

francislavoie · on Oct 27, 2020

I'm still trying to figure that out honestly, but https://www.meilisearch.com/ is pretty sweet. Doesn't cover everything ES can do obviously, but it covers a big one.

a_imho · on Oct 27, 2020

If you use Docker or similar for deployment you can have ~static executables ootb.

zmmmmm · on Oct 26, 2020

Funny how I'm the complete opposite. For me, seeing it is written in Java is a big plus.

As soon as I see Java I know it's going to have a whole set of properties that will make it highly manageable to deploy and run. And when they go wrong I have a lot of hooks and tools to understand and resolve the issue that are completely unavailable to me with native applications. You apply the pejorative "baggage" to these as if they are valueless but sometimes you just have to develop enough experience before the utility of things becomes apparent.

(Writing an application in Java however ... I would go straight for a JVM language like Kotlin/Groovy/Scala)

vips7L · on Oct 26, 2020

Personally I never want to touch Scala again. Implicit and operator overloading hell. Let alone the shit show that is sbt and c++ compile times from scalac.

zmmmmm · on Oct 27, 2020

Yes, I've been bitten by that too!

Ironically it pushed me back towards Groovy because I found I was constantly hitting unexpected performance bottlenecks by implicit conversions that unexpectedly jumped in and executed things I didn't even realise were happening. Groovy was completely unelegant but pragmatic option that actually did what I expected most of the time.

vips7L · on Oct 27, 2020

I'm more of a Java man myself. I like groovy though, it feels like a more functional python. I could see myself using it just for scripts.

ebruchez · on Oct 27, 2020

Not sure what to say to that, except that it might have been a long time ago, or with some very specific libraries or teams?

If it was years ago, then I can say that things have changed a lot since the last time you touched Scala, and in the right direction. Most of the criticisms from, say, 5 years ago, are being addressed.

For example, if by "operator overloading" you mean "symbolic method names", then those have largely fallen out of fashion. sbt has seen huge improvements (some people dislike it, some people like it, but it's one of the most powerful build tools out there). Compilation times also have improved.

As far as I am concerned, it is my favorite language, and there is so much going for it: great tooling, the upcoming 3 version, binary compatibility improvements, rock-solid JavaScript compilation, native compilation under development, and, of course, all the benefits from a language with one of the most powerful types systems around.

spenczar5 · on Oct 27, 2020

Both of you are right. You happen to be a “Java pro” who knows how to use the tool well. It’s a bit uphill for shops without Java experience - more uphill than with other languages, perhaps.

johnisgood · on Oct 26, 2020

Thank you for pointing this out. I read the title and it sounded nice, but if Java is anywhere near close to it, then no thanks. Thank you, you saved me from wasting time. :)

ktpsns · on Oct 26, 2020

I second this so much! I have the feeling that the ASF tries to be so enterprisy with Java. I have no idea.

(Note that there are a lot of very high quality Apache java projects.)

jaaron · on Oct 26, 2020

You have to go back almost 20 years to understand the relationship between the ASF and the Java community. Java at the time was positioned to be a core web technology (originally as applets, then as as backend tech via J2EE) and Apache was interested in ensuring the web ecosystem as a whole wasn't locked up in proprietary tech. The ASF was rather influential in working with Sun at the time to foster was eventually became a vibrant open source community.

I know the ASF isn't as trendy these days, but that's never really bothered those of us who volunteer. The focus has always been about a stable, open source ecosystem, primarily, but not exclusively for the internet and its underlying infrastructure. The goal is measured in decades.

It wasn't about being "enterprisey" or anything like that. The ASF as a foundation doesn't really care about the language or tech stack. Being a more mature non-profit, many corporations have found it easier to interface with the ASF than many other open source organizations out there, which does lend itself to seeing a lot of enterprisey donations.

pjmlp · on Oct 26, 2020

One of their best features.

raz32dust · on Oct 26, 2020

What baggage is specific to Java? You'd want some experts for the language your critical services are written in, regardless of it being java or C or Rust or whatever. Likewise, a java shop wouldn't want to use a critical service written in C.

klodolph · on Oct 26, 2020

For Java you need people in operations who are good at things like JVM tuning working. Just a fact of life, speaking from experience working on teams that do that.

Languages like C and Rust do not have any equivalent to JVM tuning.

lmm · on Oct 27, 2020

> For Java you need people in operations who are good at things like JVM tuning working. Just a fact of life, speaking from experience working on teams that do that.

You really don't. Anything you could do it, say, Go, you can do in Java without doing any JVM flags. The only time you need to do JVM tuning is when you have performance requirements that are simply impossible in most languages.

pron · on Oct 26, 2020

That was true for Java 8; we're at Java 15 now. No need to tune anymore (except in extraordinary circumstances).

klodolph · on Oct 26, 2020

It’s still true for Java 8, because a lot of people still use Java 8, because Oracle changed the licensing model.

pron · on Oct 26, 2020

Yes, Oracle changed the JDK's licensing model from part-proprietary/part-open, part-free/part-paid to 100% open and free (as of JDK 11). What confused some people is that the old website for the semi-free JDK is now for Oracle support customers who pay for support (even though that page directs non-customers to the free version) and Oracle provides the free JDK over at http://jdk.java.net/

nixgeek · on Oct 26, 2020

Hi Ron. If you're still employed by Oracle, it might not be a bad idea to disclose that, either in the specific comments where you may be talking (semi-)authoritatively about Java and its licensing / commercial model, or within your HN profile. I recall the last talk I saw from you was an AMA at QCon London in 2019 about Java, where you were with Oracle.

I found the comments really clueful but also appreciate when others like e.g. @_msw_ are very overt with disclosure of their employment and interests whilst talking about $employer's stuff on HN (and in his case that'd be AWS and EC2).

pron · on Oct 27, 2020

Hey Alex! Yes I am with Oracle, and indeed I always make sure to disclose my affiliation: I do so on HN every couple of months or whenever it's relevant (when saying something that could be biased/controversial). It's not just a nice thing to do, but it's also company policy.

klodolph · on Oct 26, 2020

It is… definitely not that simple. Not for everyone, at least.

shock · on Oct 26, 2020

What is the less simple part?

wbl · on Oct 26, 2020

OpenJDK is still available under the GPL.

dataflow · on Oct 26, 2020

We have Java 15 now?? Java 9 came out literally 3 years ago...

dfinninger · on Oct 26, 2020

Biannual release cadence now. 8, 11, and 17 (eventually) are LTS releases. Every 3 years.

https://wikipedia.org/wiki/Java_version_history

dtech · on Oct 26, 2020

They switched to 2 releases per year

hedora · on Oct 27, 2020

Java 15 defaults to a pause free garbage collector with bounded space overhead? Citation needed.

Edit: also, did they replace the JIT with AOT?

pron · on Oct 27, 2020

> Java 15 defaults to a pause free garbage collector with bounded space overhead? Citation needed.

No, OpenJDK 15 defaults to G1 (since JDK 9), but it is very different from G1 in JDK 8 and rarely requires tuning: https://archive.fosdem.org/2020/schedule/event/g1/

Plus, ZGC, the low-latency collector (which gives a ~1-2ms max pause times on heaps of up to 16TB in JDK 16) was introduced in 11 and made production-ready in 15.

> Edit: also, did they replace the JIT with AOT?

Replace? Why would anyone want to do that? But there is an AOT compiler available: https://www.graalvm.org/reference-manual/native-image/

wwright · on Oct 26, 2020

> Languages like C and Rust do not have any equivalent to JVM tuning.

They absolutely do. It’s called “managing memory.”

echelon · on Oct 26, 2020

Which is a breeze. You solve it when writing the program. Not at runtime in some mysterious JVM pause death spiral.

uglycoyote · on Oct 27, 2020

I dunno, I work in C++ in games and we spend way too much time thinking about how to manage memory. It's not a breeze at all.

cle · on Oct 27, 2020

You would do that even with a GC language like Java. The serious performance code paths in Java also do meticulous memory management to minimize allocations, GC costs, and utilize off-heap memory (Netty comes immediately to mind).

In my opinion it’s the worst of both worlds at that point—you’ve got to manually manage memory and tip toe around the runtime.

echelon · on Oct 27, 2020

You're not wrong, but that's a bit of an unfair comparison.

You're likely talking about work on custom memory allocators and impact on real time performance. In the hosted services domain this isn't so much an issue. Nobody's doing that in Java, and you likewise wouldn't care much if writing the same type of service in Rust.

Rust very much enables you to write high level, Java-style code. It's not bad code, nor is it difficult to write.

hedora · on Oct 27, 2020

Do the developers thing about these things once, or do the operations people think about them continuously, and for all time?

pjmlp · on Oct 26, 2020

Apparently you haven't read the compiler man pages for optimization flags.

klodolph · on Oct 26, 2020

I've been staring down compiler flags for two weeks straight, now, since I've been working on cross-compilation toolchains, but that kind of narrative doesn't fit your current level of snark, I guess.

I have run operations for programs written in Java, C++, and Go. I've done it both at small companies and large companies, for everywhere from a rack of dedicated servers running Apache Tomcat to thousands of virtual machines running a mixture of C++ and Go microservices.

The constant thread is... over the last ten years... the people running JVMs complain about tuning the JVM, and always have stories to tell about it. The JVM is fantastically tuneable, and when people online complain about GC pauses or memory problems, there's often some way to tune the JVM to fix those problems. The JVM is amazing, it's a marvelous piece of technology.

But there's also a bunch of people who don't know how to do it and just turn knobs. Like, oh, customers are calling in to complain about latency, so I'll increase the size of the Java heap. (Which, for those not familiar with GC, will make throughput better but make latency worse.) Running services on the JVM means that "working with the JVM" is now a skill you need to select for whatever operations team you have, and if you're a company running a mix of different services (like, I don't know, some databases, memcache, load balancers, etc) then "knowing how to tune the JVM" competes with several other skills you'd like your team to learn.

vbezhenar · on Oct 27, 2020

I wrote Java for last ten years and never had to tune anything but Xmx. It just works for me.

hedora · on Oct 27, 2020

In that time, how often did you care about sub millisecond latency SLAs?

(You can read a disk 10 times in series or 100-1000 times in parallel in 1ms; it’s practically an eternity.)

vbezhenar · on Oct 27, 2020

Never. I don't think I would use Java (or any language with GC) for such kind of job, it's more suited for C/C++/Rust.

pjmlp · on Oct 27, 2020

Depends,

https://www.ptc.com/en/blogs/plm/ptc-perc-virtual-machine-te...

returnfalse · on Oct 27, 2020

Then, you might be interested in watching this: https://www.youtube.com/watch?v=Pz-4co8IaI8

pjmlp · on Oct 26, 2020

Just like there are whole conference talks on how to use PGO data properly and optimization flags on AOT compilation toolchains, naturally one needs to spend the effort to learn how to use them and not just type -O2 and go home.

Or those that complain about RDBMS queries being slow, without having normalized their data or written proper indexes.

There is always bunch of people that don't know how to do things, and then there are those among them, that care to improve their skill set and get to know how to turn those knobs.

klodolph · on Oct 26, 2020

Sure, but it’s only at large scale that you care about stuff like PGO. My experience is that even small shops care about JVM tuning.

It’s not a question of whether there are knobs to turn. It’s about the typical experience of an actual person running a JVM app versus, say, C or Go, and those experiences are quite different.

And we’re also talking about running someone else’s app. If your database queries are slow, that’s a conversation between the DBA and the devs. If you’re running some Apache app, you’re probably not talking to the devs.

pjmlp · on Oct 27, 2020

In 20 years of using Java I only had one project where fine-tuning knobs was actually relevant.

Most of the time defaults just work.

Arelius · on Oct 27, 2020

Yeah, I've been a C/C++ programmer for 15+ years with heavy focus on performance, and have not once run PGO. I think it's quite often the case for C/C++ programs that you -O2 and walk away, at least for codebases that your team doesn't specifically own.

bokchoi · on Oct 27, 2020

> My experience is that even small shops care about JVM tuning.

This hasn't been my experience at all after running java on production servers for over a decade.

gpapilion · on Oct 27, 2020

Unlike C/C++ you do have to care about tuning at a minimum to set the memory sizes.

That said you probably can’t get very far in c/c++ without worrying about memory the whole way through. So, you only worry about tuning when writing your code.

wbl · on Oct 27, 2020

When you run into that problem with Go you stop dead because there is nothing to do.

iameli · on Oct 26, 2020

But that's all compile-time. Someone else very smart can do that and give me an optimized binary. JVM tuning needs to be performed by the end user.

pjmlp · on Oct 26, 2020

Just like AOT compilation with PGO data.

shock · on Oct 26, 2020

Well, it's a trade-off. If someone compiled the C++ app for your architecture, sure – if not, you need to figure out how to compile it and depending on the compiler and set of headers you have installed, it's not always easy. When I say, not always easy I'm being generous, there have been cases when the function signature was different between the sets of headers the author had and what I had...

In Java's case if you have all the class files, all you need is the JVM for your architecture. You don't need the program's author to compile it for your architecture.

And finally, JVM tuning does not need to be performed by the user, but can be performed by the user. The most frequent tuning needed to be done by the user was the maximum memory that the JVM could allocate, but that hasn't been needed for a long time (since Java 8, I think). Now the JVM, by default, has the sensible behavior to allocate as much memory as the OS will let it, when asked by the application.

usr1106 · on Oct 26, 2020

But you can use 99% of C/C++ applications without any compiler flag trickery.

My experience is not that you can run 99% of Java applications without running out of memory or something like that. That's why I avoid Java these days when I have the option. I don't think things would have changed much that much in recent years, making my experience no longer valid.

somehnguy · on Oct 26, 2020

???

I've been using Java daily for years at this point. The only time I've had that sort of issue is when the program I was writing dealt with large files in memory, and it was solved with a simple -Xmx16G.

usr1106 · on Oct 27, 2020

Hah, I still use laptops with 2G and 4G :) They work perfectly fine when you know what you are doing. But running Java with 16GB pool does not fall in that category. The same holds for small cloud instances.

Admittedly even with a C program the size of a problem requiring 16GB might not run well in a 2GB machine, but the fact remains that Java applications are often much more memory hungry than comparable C/C++ implementations. And if you have the memory they will just use it without any Xmx. That sounds really like computing in the 1960/70s that you need to allocate memory in advance. (I'm old enough to have seen such stuff, although not when it was new.)

winkeltripel · on Oct 28, 2020

Java may be more memory hungry, but 4gb of memory hasn't been enough for a wide array of tasks for about a decade now.

pron · on Oct 26, 2020

If your experience was with Java 8, then things have changed radically since then.

usr1106 · on Oct 27, 2020

9 might have been the last one I have used. I definitely have used 8. (And certainly 1.2, 1.3 etc, but those were different times...)

Is there a concept that the search machine would find, which has brought the radical change you mention?

pron · on Oct 27, 2020

No, just lots of different things.

G1 has had ~700 improvements since JDK 8 and is now very different: https://archive.fosdem.org/2020/schedule/event/g1/

ZGC, the low-latency collector (which gives a ~1-2ms max pause times on heaps of up to 16TB in JDK 16) was introduced in 11 and made production-ready in 15.

In general, the way the GCs work is just very different from 8.

Plus, there have been lots of improvements to CDS and startup time in general due to how the VM loads classes and initialises data (~30 ms to run Hello, World on a cold VM in 15) -- https://cl4es.github.io/2019/11/20/OpenJDK-Startup-Update.ht..., https://twitter.com/cl4es/status/1311335253139771393, https://www.morling.dev/blog/building-class-data-sharing-arc...

And, of course, now everything lies on top of the module system.

pjmlp · on Oct 27, 2020

Complete replacement of GC algorithms for start, with G1, ZGC and Shenodah.

Regarding OpenJDK, other JVM vendors have improvements of their own.

nfg · on Oct 26, 2020

The JVM that Java shop is using is likely written in C, which should make you question the straight equivalence you’re setting up. There is an operational difference between getting a native executable and having something that sits on top of Java (or any other managed platform).

raz32dust · on Oct 26, 2020

Not sure what you mean by "managed platform". Java being written in C doesn't mean you need to know C to run Java. You hardly deal with any native code at all unless you write your own JNI interface or do some obscure tuning, which is pretty rare.

jayd16 · on Oct 26, 2020

>The JVM that Java shop is using is likely written in C

Does't this argument cut both ways?

nfg · on Oct 26, 2020

I don’t see how it does at all - the point isn’t C versus Java, it’s native versus managed. Any reasonably proficient team running managed services needs to know how to deal with native dependencies, but the reverse is not true.

athriren · on Oct 26, 2020

not in my view, a person who knows neither java nor c. but i think the counterargument from the people in this thread would make would be something like:

i can write my C programs without ever thinking about java. it is irrelevant to me. however, C is very important for java since the JVM it is running on top of is written in C.

i think, personally, the outcome this type of thinking is that “therefore every java program is in fact just cruft on top of C” which personally as someone who does 80% of their job in SQL i am ill-equipped to object on java’s behalf.

pjmlp · on Oct 26, 2020

There is more than one JVM,

https://en.wikipedia.org/wiki/List_of_Java_virtual_machines

Most of them don't have a single line of C, rather C++.

Then there are a couple of them like JikesRVM and GraalVM that are meta-circular implementations.

pjmlp · on Oct 26, 2020

There are Java implementations written in Java and that beloved C compiler nowadays is written in C++.

nfg · on Oct 26, 2020

You’ll note the word “likely” in my comment. And there’s nothing special about C in my argument - I was responding to a comment which used it as an example. The point is simply that one way or another any given team likely knows how to tune, monitor and/or diagnose issues in native bits - in fact even a Java focussed team likely has these skills. But the reverse is clearly not true.

rodgerd · on Oct 26, 2020

For me, at this point, the biggest baggage is dealing with Oracle and funneling them money if you want an LTS JVM, alongside the shifting sands of their licensing rules. Unfortunately it can still be a bit of a wheel of fortune as to how well (if at all) something runs on OpenJDK.

pjmlp · on Oct 26, 2020

Given that they are currently 99% the same code, that is very unlikely. Java 8 is long time ago.

crznp · on Oct 26, 2020

Do you have any examples of programs that run well on Oracle JDK but not on OpenJDK?

hedora · on Oct 27, 2020

Between the open-world assumption the classloader uses, and the lack of a performant, general purpose garbage collector, you can’t seriously consider deploying Java at scale without having experts on hand.

I’ve deployed things written in many languages at extremely large scale. They’ve been written in Java, C, C++, python, go, sh, perl, and other languages I’ve forgotten. Java is, by far, the least operable of those languages.

I’m an expert Java developer, along with most of those other languages (where expert is defined by me as “over 10 years experience”).

In fairness to java, it’s not my least favorite language on the list.

chrisjc · on Oct 26, 2020

One of the most important and valuable (at least to me) things about the Java and the JVM is its ecosystem. And Apache is a part of that ecosystem. BTW a lot of the Apache projects aren't even Java or depend on the JVM.

> If these things were written in C, Go or Rust...

Do these languages have similar ecosystems?

francislavoie · on Oct 27, 2020

Go and Rust have very good package management (Go's kinda sucked but go modules is helping a lot), so yes. They both work much nicer than the occasions I've used maven etc.

chrisjc · on Oct 28, 2020

I think that package management is very different from an ecosystem. I'm somewhat aware of rust and gos' package managers, but I'm really curious about if they have ecosystems similar to Apache. I wouldn't be too surprise if the Apache ecosystem embraces more and more go and rust projects as time goes on...

kinow · on Oct 26, 2020

You can check if there is a project like that in the incubator, or suggest one if you have an idea or some code to donate.

Most new projects are related to existing ASF projects, or use ASF libraries, or are created by someone already in the ASF that maybe use/prefer Java.

But there is no requirement on a project being Java.

See Airflow, Arrow, Log4net, httpd... it's just a question of people creating the proposals that must show a possible community of users/devs to use/support it.

jwr · on Oct 26, 2020

Unless you are happily using a JVM-hosted language of course (Clojure in my case), in which case it is a huge advantage. I can use this immediately and easily in production.

shock · on Oct 26, 2020

It would be useful to you to recognize that you are letting an emotional reaction stopping you from using useful technology. I understand that Java makes you uncomfortable, but it is a powerful and stable language and you're missing out by avoiding it. It would take much more time to implement a similar project in C, and it would likely have lower stability. Yes, implementing it in C would use a lot less memory and if that's what you're optimizing for going with something else than Java is a wise choice most of the time.

Most technology is useful, depending on what you're optimizing for. I urge you not to optimize for being comfortable.

inopinatus · on Oct 26, 2020

The language isn’t the problem. The runtime is the problem. It’s a dependency mess, a licensing lottery, and a tuning crapshoot. The overhead of simply making it work, and maintaining that status across heterogeneous environments, is more than non-Java shops want to undertake.

Having spent much of the first half of my career writing C and Perl and then Java, I was very happy to set aside the latter two - for completely differing reasons.

GordonS · on Oct 27, 2020

You missed one: a constant stream of security patches for the JVM.

emanlin · on Oct 26, 2020

OPs points are valid. Filesystem work requires experience with system calls and I/O behavior. Java strives to abstract the underlying system away from the developer. System level tooling written in Java frequently requires twice as much expertise because there are two big systems in play instead of one, hence the unnecessary baggage.

shock · on Oct 26, 2020

My point was not to this project in particular, but general. I have seen this reaction towards Java often. Java is not familiar, for many people Java is inscrutable. I think this is because of the existence of the JVM and the need for the user to be aware of its existence. Rust is not simpler than Java, but the user just deals with a compiled executable, he doesn't see any complexity. Maybe GraalVM with nativeimage will help Java in this regard.

breakfastduck · on Oct 26, 2020

It's not an emotional reaction they were expressing at all - they even explicitly stated it's because of the need to bring in someone savvy with Java and the JVM to be able to support it. Sometimes there aren't the resources.

shock · on Oct 26, 2020

Someone not having an emotional reaction would not use expressions like "All the worst and most time-consuming troubleshooting experiences of my life have involved Java" and "the JVM decided to vomit all over my screen".

See sibling comment: https://news.ycombinator.com/item?id=24899595

breakfastduck · on Oct 26, 2020

"All the worst and most time-consuming troubleshooting experiences of my life have involved Java" is simply a statement.

All the worst and most time-consuming troubleshooting experiences of my life have involved dealing with a legacy VB6 app with a truly terrible MSSQL database structure - that's not an emotional statement, it's a fact.

shock · on Oct 26, 2020

> All the worst and most time-consuming troubleshooting experiences of my life have involved dealing with a legacy VB6 app with a truly terrible MSSQL database structure

Do you feel any powerful emotion when you think about that experience? Would you be indifferent if you would need to do it again?

breakfastduck · on Oct 26, 2020

Yes, dread. But it doesn't make the statement any less true. Myself and every single person maintaining that system felt exactly the same - it was a complete mess.

shock · on Oct 26, 2020

> it was a complete mess

I'm not doubting that at all. Was it VB6's fault or MSSQL's?

breakfastduck · on Oct 26, 2020

It was the owner of the platform & its developers, neither technology - I think we both understand that.

I was just trying to make the point that it's not necessarily emotional for someone to not want to work with a certain tool without an expert available because of terrible past experiences where that was the case.

I'd say it's almost impossible to imagine any past event someone has present at without having some emotional response.

gosukiwi · on Oct 26, 2020

The emotional raction is the resistance to Java and the assumption that it will break and you will need a Java expert to fix it. I don't know much C and I use MySQL and don't need to worry about C internals at all when using it. Just some configuration stuff, and maybe install some C build tools. I wouldn't expect it to be much different for this project.

w1nk · on Oct 26, 2020

I think you've overlooked OPs point a bit. You don't need to understand the C/libc internals to be able to operate MySQL (you do need to know it's quirks though). With a JVM application, you're going to need to understand the application level configs, think of something like spark, and if you're doing anything with large amounts of data or high RPS, you're also going to need to understand the JVM internals. The closest analog might be needing to understand the libc allocator for certain usage patterns, and that happens, but it's far more rare than someone needing to tune garbage collection algorithms and parameters. JVM tuning is just a reality of operating in the java ecosystem.

fer · on Oct 27, 2020

Java is great, the JVM is hell.

Our shop has about the same Cassandra DBAs, as JVM experts to keep it running.

Every time a node goes down because of JVM shenanigans the ScyllaDB migration discussion pops back.

67868018 · on Oct 26, 2020

Java is a significant legal risk. Do not let it into your company.

shock · on Oct 26, 2020

How so?

jayd16 · on Oct 26, 2020

I don't think Java adds any more things you must tune so much as it adds a lot of things you can tune.

You can just treat these appliances as a black box if you can't be bothered to learn the tooling.

Out_of_Characte · on Oct 26, 2020

Untill its not, Then I have a problem because I dont understand java

shock · on Oct 26, 2020

You have a problem with whatever you don't understand. These days, you also have a problem if you don't understand the germ theory of disease :)

geodel · on Oct 26, 2020

Well Apache and Java are like GUI and JS. I'd be shocked if most of collaboration, productivity, editing tools etc is not JS/Electron today.

Not saying that either is good, but it is just fact of life.

buryat · on Oct 26, 2020

java is very easy to troubleshoot compared to c, go, and rust

traceroute66 · on Oct 26, 2020

> java is very easy to troubleshoot compared to c, go, and rust

All the worst and most time-consuming troubleshooting experiences of my life have involved Java. Whether staring at pages of obscure stacktraces that the JVM decided to vomit all over my screen, or hours on end with Oracle tech support troubleshooting why Java decided to throw a tantrum (followed by editing obscure Java XML config files).

These days I just stay well away from Java.

threeseed · on Oct 26, 2020

You know that none of that has to do with Java itself.

It's the app whose stack traces you are looking at. And the app that decided to use XML config files.

cle · on Oct 27, 2020

It most definitely comes with the territory.

lmm · on Oct 27, 2020

But applications tend to feel the need to use XML config files because of the limitations of Java.

voidfunc · on Oct 27, 2020

Please elaborate on those "limitations".

I really haven't seen all that much XML in Java recently outside of legacy applications... there's libraries for loading config data from JSON, YAML, etc etc.

lmm · on Oct 27, 2020

JSON, YAML, or XML, it's all the same crap. The fact that programmers are feeling the need to externalise their business logic into "config" files is a sign that the language has failed. In a decent language you would write logic or "config" in the language itself, because the language would be a comfortable medium for expressing things (this is the norm in Python for example - while a few overengineered pieces insist on using TOML or something, most Python libraries don't require any separate config, you just "configure" them the same way you do the rest of your programming, in Python).

nisa · on Oct 26, 2020

well yes... it's a learning curve and it takes a while but if you are familiar with the jvm and xml and to some degree spring-framework and reflection it's usally pretty easy to find a root-cause - you also have great tooling like async-profiler or just sending `kill -3 <pid>` to get a stacktrace. At least you've got a hill to climb upon instead of an empty field.

slaymaker1907 · on Oct 26, 2020

At least there is a stack trace as opposed to a segfault with no context.

Gibbon1 · on Oct 27, 2020

I trap segfaults in my firmware, save off a stack trace, restart and then log it. Is very useful.

candiddevmike · on Oct 26, 2020

Can you expand on this? Why do you think that?

shock · on Oct 26, 2020

As an example, this app (Apache Helix) implements MBeans, which means that you can connect to the running application to observe its state and manage it. You can do this with the standard JMX toolchain, without having to implement anything else in the app other than the MBeans.

I realize, writing this, that for someone unfamiliar with the ecosystem this sounds very abstract, but MBeans (managed beans) are simply counters, operations, stats, faults that the app makes available to other apps/users via a standard protocol.

shock · on Oct 26, 2020

Extremely good tooling/ecosystem. See VisualVM, JMX...

acdha · on Oct 26, 2020

How extensive is your experience with other toolchains? Programmers notoriously use easy as a synonym for familiar and given how long Java has been around by now we'd have some evidence showing a major productivity advantage. Companies like Google with massive Java experience also investing heavily other languages suggests that this is not so compelling.

lemmsjid · on Oct 26, 2020

Being able to off-the-cuff use JMX to do realtime heap and stack analysis and then go deep with an offline heap dump (via Eclipse MAT, for example) is quite nice. On top of it you have a standardized interface for emitting counters and gauges. You can kind of get that through effort with other ecosystems (closest probably being .NET), but with the JVM it's just there and it's standardized. Have a memory leak that takes days of operation to manifest? Set up a trigger to heap dump at a certain time, then open up MAT and you can find some very subtle memory leaks and lock contention issues. Any JVM is basically running in a valgrind sandbox, with enough symbols to support a low overhead gdb connect, with a standard API for emitting meta-performance data.

All the above can be done in other environments, but generally with specially prepared deployments or executables, with choices being made along the way as to how to do it. I think the two key things are that in the JVM the heap is actually a quite structured database, which allows for introspection without recompilation or special tooling, and there's a standard mechanism for exposing detailed performance data.

Whether or not that translates into actual productivity differences is tricky. In my experience large companies tend to build their own equivalent and better-targeted tooling, and smaller companies increasingly pass their performance diagnostics to SAAS companies. It takes time to learn diagnostic tools and procedures in general, and in my experience a lot of Java teams don't know the tooling they have. I'd say the main productivity gain would be in quickly diagnosing production issues. You could argue that the JVM leads to software thinking that increases production issues by relying on long-running processes that need the stability to survive, but in my experience there are definitely niches where that is the only way to meet your performance goals.

rhizome · on Oct 26, 2020

>Programmers notoriously use easy as a synonym for familiar and given how long Java has been around by now we'd have some evidence showing a major productivity advantage

Have you ever heard the phrase "the dog that didn't bark?" You appear to be using an absence of evidence to prove evidence of absence, not to mention an argument from ignorance (which is always possible in our post-Gödel world).

acdha · on Oct 26, 2020

That's a rather pretentious way to miss the point. The question was “Can you expand on this? Why do you think that?” and the response was a sweeping assertion and the name of a couple of products with no further details, or even a clear indication of which of the three languages mentioned further up-thread or others were being used as a basis for comparison. I wasn't asking for a peer-reviewed paper but even, say, “My team had a productivity loss when they switched to Go because [reasons]” would be better than nothing at all.

shock · on Oct 26, 2020

I was responding to:

> java is very easy to troubleshoot compared to c, go, and rust

>> Can you expand on this? Why do you think that?

So the comparison is C, Go, and Rust. Yes, I'm familiar with all 3 of them and Java is easier to troubleshoot than either of them because it has better introspection out of the box.

acdha · on Oct 27, 2020

That’s getting closer to what would have been an informative response. It could use further expansion but that could allow someone reading it to weigh how much that would remove a limiting factor in their experience.

For example, also having experience with all of them, the overall advantage over C is clear-cut but Rust is a lot less clear since stronger type system and better culture around package management and complexity avoids a lot of problems that otherwise require runtime troubleshooting.

Granted, a key part of that is really the question of whether you’re talking about a modern Java project or the more common enterprise Java sort with layers of accreted complexity and probably architecturally frozen at Java 8, where the problem is cultural rather than the language itself.

shock · on Oct 27, 2020

> That’s getting closer to what would have been an informative response. It could use further expansion but that could allow someone reading it to weigh how much that would remove a limiting factor in their experience.

Consider that you've gotten more useful information from my response than what you've paid for. If you want the whole picture, buy the book. Your sense of entitlement to more of my time seems misplaced.

marcinzm · on Oct 26, 2020

>Companies like Google with massive Java experience also investing heavily other languages suggests that this is not so compelling.

Google's scale means that performance actually matters a lot versus productivity. When you've got millions of servers the costs add up. So making something twice as fast but taking 25% more dev time is probably a good tradeoff for them. For most other companies that's not the case.

seibelj · on Oct 26, 2020

Java is extremely old and stable. The obsession with Rust as open-source savior is overwrought. Rust may be good but Java has been around 25 years, give Rust another 10 years and 3 major language revisions before I will trust it.

late2part · on Oct 26, 2020

No, it's not. My experience is that for poor programmers, Java is easy because they lack the fundamental skills, and are familiar w/ the mechanics of Java. Of course this is a broad generalization and terribly unfair empirical anecdata.

ashtonkem · on Oct 26, 2020

If you yourself acknowledge that this is an over broad and unfair generalization based on anecdotes, why did you say it?

ir123 · on Oct 27, 2020

Curious, what obscure JVM problems have you encountered? My experience with Java has been that it is easy to get started and work with a codebase.

x87678r · on Oct 26, 2020

True but if it was written in C it never be completed and would likely crash all the time. :) Just be thankful they're not Python.

rodgerd · on Oct 26, 2020

If JVM optimization is "too hard" then you're going to have a bad time trying to run a distributed computing system.

lutoma · on Oct 28, 2020

Also, every single Java application I've ever used has used an entirely unreasonable amount of memory for the job it's supposed to do. I'm sure you can write things in Java that are lightweight on resources, but I've never seen it.

shmerl · on Oct 27, 2020

I also wonder why are they using Java? Filesystems should be written in something like Rust. Using languages like Java for systems programming always looked bizarre to me.

FridgeSeal · on Oct 27, 2020

I agree completely.

I love using things written in Rust, because if nothing else I know it's going to be resource efficient and the deployment will be nice and easy.

Java applications: I hope there's enough spare memory, I hope the setup instructions aren't littered with "set this obscure JVM config in some obscure way, that isn't the application config, but we won't tell you how/where, you've just got to figure that one out"

jjtheblunt · on Oct 26, 2020

I wonder if folks are more averse to Maven than Java

shock · on Oct 26, 2020

I don't see a reason for users to be aware of Maven, similarly when you get a C, Go, app, you are not aware of the build tool that was used to create it.

pjmlp · on Oct 26, 2020

That exactly what I like from Apache projects, one step at a time to make C less relevant.

StreamBright · on Oct 26, 2020

This is not the most serious problem, by far.

chuckSu · on Oct 26, 2020

While agree that some of the Apache projects would be better written in something other than Java but the comment about spending hours debugging obscure Jvm issues is huge exaggeration

lmm · on Oct 27, 2020

Java is pretty much the only language I'd want to see low-level libraries written in. Go libraries can only be used from Go, Rust libraries can only be used from Rust, and any nontrivial C library contains an arbitrary code execution vulnerability when built with a newer compiler. Java is a lowest-common-denominator language but it's memory-safe, cross-platform (in a way that C# isn't really yet), and there's a decent range of languages that you can run on the JVM and use Java libraries from.

francislavoie · on Oct 27, 2020

> Go libraries can only be used from Go

https://github.com/draffensperger/go-interlang

> Rust libraries can only be used from Rust

https://rust-embedded.github.io/book/interoperability/rust-w...

And any language that supports an FFI interface can call rust code. For example you can write a rust lib and call it from PHP using it's recently released FFI https://www.php.net/manual/en/book.ffi.php, e.g. https://dev.to/verkkokauppacom/introduction-to-php-ffi-po3

lmm · on Oct 27, 2020

If a Go or Rust library takes the time to offer a C-compatible interface (most don't), or you add one yourself, then you can invoke it via the anaemic, unsafe C ABI - no first-class objects, functions, or sum types, raw pointers everywhere. At that point it might as well be a C library, because any program that uses it is still inherently memory-unsafe.

redm · on Oct 26, 2020

I wonder how near-realtime this is. It seems that will dictate the types of applications this will be applicable for.

I currently use Gluster which has its downsides, but is what I consider near-realtime, but with full filesystem features in a nice fuse wrapper. I'd love it if someone would contrast the performance and more of the features.

late2part · on Oct 26, 2020

In the mid-2000s (the "oughts?") I went to my doctor at PAMF with a sore throat and he said he would do a strep test.

I asked him if we would get "real time results?"

He stopped, and looked at me puzzled with a tilted head... "What other kind of time is there?"

Your question is highly reasonable, but I heard a colleague today say that the issue could happen only in a very, very short time period of, say, 2-3 minutes. We routinely work with corner cases of 1-2 ms; so the context is relative :-)

jjtheblunt · on Oct 26, 2020

"context is relative" made me smile, thinking "PAMF" is Palo Alto Medical Foundation, but why would anyone in general have the context to understand that

stavros · on Oct 26, 2020

What are the downsides of Gluster?

rodgerd · on Oct 26, 2020

Gluster's performance with metadata heavy workloads is very poor. If you're running files with a high ratio of data:metadata, it works well; if you're running something like a git repo on Gluster it is not a great experience.

It's also thrashing a bit development wise; there's a lot of features being pruned, and I'm a little concerned what its future looks like as Red Hat push harder down the Ceph route, since they're the main source of code contributions to Gluster.

mperham · on Oct 26, 2020

I use Resilio Sync in production with my servers. I like it more than this solution because it does not require a primary and replica design but instead uses bit torrent where all servers are equal. I can add a new server and drop an old server and it handles everything gracefully.

GordonS · on Oct 27, 2020

Haven't come across this before. At a glance, it looks like it's a file sync tool for small scale use, rather than a "serious" tool for replicating a file system.

Curious to know more about your production use case?

minaguib · on Oct 26, 2020

It looks like a more distributed alternative to lsyncd ( https://axkibe.github.io/lsyncd/ )

varispeed · on Oct 26, 2020

I used to run distributed system based on a few bash scripts and rsync to work in real time. It has become more complex when each "cell" exceeded the node capacity. The file naming system was not flexible enough to split these cells into smaller nodes and I abandoned writing a router for Nginx to direct requests to right server based on the filenames. I just decided to throw a couple of big machines at the problem instead and that removed all complexity. It wouldn't be possible just a couple of years ago though. For a couple of years the rsync solution worked very well.

gfodor · on Oct 26, 2020

This might be an appropriate thread: does anyone have a OSS alternative to EFS they have run in production? (A NFS-mountable, elastic storage system.)

JoshTriplett · on Oct 26, 2020

Does it specifically need to be NFS, or just mountable?

gfodor · on Oct 27, 2020

Just mountable

gluteousminimus · on Oct 29, 2020

In the HPC space, Lustre.

cyberpunk · on Oct 26, 2020

Leofs

3np · on Oct 26, 2020

I'm using a distributed filesystem for persisted files (config, assets and resources, etc - anything that'd need to be in a docker volume and persisted between restarts) in my container workloads. This mostly works fine, except for sqlite applications that insist on using WAL - this completely breaks and results in corrupted database files.

Maybe this could be an alternative for those!

teraflop · on Oct 26, 2020

"Near-realtime" isn't good enough for SQLite. It has a strong dependency on the consistency guarantees provided by a POSIX-like filesystem.

The approach described in this page does make some attempt to provide consistency. The master generates a single stream of file updates, and all of the replicas consistently apply those changes in the same order. But rsync is not guaranteed to observe changes to files on the master in the same order that they were written.

In particular, SQLite (by default) uses a rollback journal to recover from failures. While a transaction is in process and some parts of the database might have been partially updated, the journal stores the old contents of the modified pages. Since the old data is fsync'ed to the journal before the new data is written to the database file, the data on disk at any moment in time is recoverable.

But there's no clean way for rsync to read an atomic snapshot of both the database file and the journal at the same instant in time. If it reads them at different times, a database change might be partially applied or partially rolled back, which has a high likelihood of corrupting the database.

jlokier · on Oct 26, 2020

This won't work for database files, and it won't work for any other application that requires a filesystem to behave consistently over a network either.

The property you are looking for in a distributed filesystem is called strong consistency, or cache coherent (different name, same effect).

If a distributed filesystem does genuinely provide that property, it should be ok for SQLite and other applications, because it means the filesystem behaves the same as if it were multiple processes on the same local system accessing it.

Most distributed filesystems don't provide that guarantee though. It's a very nice guarantee, but it comes at a complexity cost, usually a performance cost (although there are clever ways to approach the perforance of a non-consistent system), and in particular it means the filesystems can't be used when the network connection is down.

There is another property called durability, which you might also care about. That affects whether you get corruption when devices fail or are rebooted suddenly.

3np · on Oct 26, 2020

I actually don't need strong consistency here - I'll only have a single instance of the container accessing the sqlite file at a single point, but they can be running on any host. As long as I can ensure replication is synchronized before process start when a container gets rescheduled we good.

It goes over my head a bit exactly why (related to POSIX lock mechanisms?) but sqlite doesn't work reliable on NFS/ceph/gluster/networked filesystems, even when only accessed from a single host, distributed or not.

https://sqlite.org/forum/forumpost/4b340b81eb

jlokier · on Oct 27, 2020

> It goes over my head a bit exactly why (related to POSIX lock mechanisms?) but sqlite doesn't work reliable on NFS/ceph/gluster/networked filesystems, even when only accessed from a single host, distributed or not.

> https://sqlite.org/forum/forumpost/4b340b81eb

I don't see anything in that forum thread which says SQLite is unreliable on sufficiently POSIX-ish network filesystems. Single host or not.

The linked post says it will be slower to commit write transactions than is possible using other methods, but I don't see anything there saying it's unreliable.

(What looks unreliable to me is Joelmo's proposal to remove the WAL lock for better performance. But the lock is there for a reliability reason, you can't just remove it. To get the boost in performance Joelmo would like requires significantly different techniques.)

slaymaker1907 · on Oct 26, 2020

You could try implementing a small shim over the FS using Posix or Dokan which just goes to the other FS but make sure writes are synced correctly. It would probably be quite slow, but is sounds like you mostly care about having something that works.

pbowyer · on Oct 26, 2020

> Most distributed filesystems don't provide that guarantee though. It's a very nice guarantee, but it comes at a complexity cost, usually a performance cost (although there are clever ways to approach the perforance of a non-consistent system), and in particular it means the filesystems can't be used when the network connection is down.

Which distributed file systems do provide this guarantee? I can't think of one, so hopeful to learn something new today.

jlokier · on Oct 27, 2020

Heh, even Microsoft SMB provides some level of coherency guarantee combined with performance, using its read and write leases. NFSv4 has a similar concept. It's an old idea by now.

I actually don't know of any distributed (multi-master, fully replicated) filesystems that are published with this property. I only know it's possible because I designed one that isn't published.

The basic principle is similar to CPU MESI caching but with predicate-scopes suited to a filesystem rather than cache lines. (Predicate-scopes are similar to predicate-locks in databases). As you know, multi-core CPU systems remain fast despite sharing memory, incurring significant overhead only for the changes that need to be communicated between cores. Same applies on a network.

3np · on Oct 27, 2020

I was curious as well and search turned up chubaofs - it claims to have "strong replication consistency"

https://github.com/chubaofs/chubaofs

tashbarg · on Oct 26, 2020

> To facilitate this, the master logs each transaction in a file and each transaction is associated with an 64 bit ID in which the 32 LSB represents a sequence number and MSB represents the generation number The sequence number gets incremented on every transaction and the generation is incremented when a new master is elected

So, what happens after 32bit transactions without re-election?

stephenr · on Oct 27, 2020

As I haven't seen it mentioned yet, if you want bi-directional sync of changes with relatively low overhead, I've found csync2 to be reasonably good for this task.

fulafel · on Oct 26, 2020

How do I mount this? Or is it like s3, inaccessible as a normal FS?

hedora · on Oct 27, 2020

It’d be nice if something like this was built atop unison.

stephenr · on Oct 27, 2020

Unison is only intended for two "points" isn't it? i.e. it's meant for pretty narrow range of "client : server" sync jobs, and is essentially unusable without exactly the same patch version on both ends.

For purely server-side "cluster" sync I've found csync2 to be a reasonable solution.

orasis · on Oct 26, 2020

What is the point of rsync if the files aren’t updated?

par · on Oct 26, 2020

Looks like a great way to keep my photos, videos and documents backed up :)

coffeeri · on Oct 26, 2020

I wouldn't consider this as a backup solution, more as one for high-availability of data.

mceachen · on Oct 26, 2020

Please don't use this as your only backup. Consider it an experimental cross-machine RAID.

You'll adopt the oft-repeated "RAID isn't a backup" mantra as soon as any of these issues hit you (like several have hit me): https://photostructure.com/faq/raid-is-not-a-backup/#why-isn...

riejo · on Oct 27, 2020

When it's 2020 and you label your boxes master/slave.

butterisgood · on Oct 27, 2020

Yeah a lot of people are moving away from this terminology thankfully!

anaganisk · on Oct 27, 2020

Im not sure if you're being sarcastic or if it really matters that we call it matters and slave. Are we so easy to get triggered?