Monday, November 7, 2011

Is Realism Unrealistic?

ISS spacewalk
Realism is viable so long as you only want to do realistic things. Exploring space is realistic. Mining it, colonizing it, or fighting Epic Battles in its far reaches are all operatic things. None of them is impossible. But all are highly unlikely, at any rate in the readily foreseeable future, given technology as we know it.

In fairness to me, I have generally discussed the 'plausible midfuture,' a time and place that is characterized as plausible, but not necessarily most likely, or even very likely at all. But that doesn't do away with the problem. Does an implicit requirement for 'plausibly' demi-realistic technology amount to an unrealistic constraint when applied to essentially operatic settings?

Or, putting it another way - and bringing it down to cases - given that deep space warfare is pretty damn unlikely under the technological restraints I have discussed here, does the Space Warfare series really provide any useful help for writers or gamers who want to make their space battles more convincing. Wouldn't they really be better off to adopt the operatic technology of their choice, then work out the implication for combat under those conditions?

This came up in a recent comment thread, in which John Lumpkin's novel, Through Struggle, the Stars, got taken in vain. I have not read the book, so I have no informed opinion on how convincing his setting is. But the question raised is much more general - does putting 'realistic' spaceships in a setting of space colonies make it less plausible than allowing a technology that would justify deep space travel as convenient and cheap?

As I have noted here before, my bias toward realistic-style spacecraft (and other details of a setting) is essentially aesthetic. Magitech is inherently arbitrary. It is well done if it holds together with internal consistency, but it is still arbitrary. And just on the level of purely visual aesthetics I was heavily influenced by the early space age, when we started getting pictures showing how things in outer space actually look.

Oddly enough, by the way, I have never seen Hollywood successfully capture this look, especially the brightness of spacecraft in full sunlight at 1 AU. In some cases this may be because Hollywood loves gothicism in space, but I suspect the real reason is much more basic: No studio lighting is anything near as bright as direct sunlight.

(Although APOD doesn't mention it, the dim lighting of the spacewalk scene above suggests that it was taken while the ISS was passing over Earth's nightside, illuminated only by floodlights and sucy, not direct sunlight.)

Speaking of Hollywood spaceships, the Venture Star in Avatar may not be quite as realistic as it looks. Its appearance, with big radiator wings and so forth, is very much Plausible Midfuture, suggestive of a gigawatt-output nuclear electric power plant ... but the drive is capable of reaching relativistic speeds at more or less 1 g. I'm not gonna do the math here, but that drive is putting out waaaaay more than a gigawatt of thrust power. Those impressive-looking radiators couldn't shed much more than the ship's galley heat.

See? This is the sort of problem you get into when you start running the numbers for our favorite space scenarios. And the problems exist at multiple levels of meta-ness, from the sorts of technical issues I just mentioned up to (at least) the question of what relationship science fiction settings can or should have to 'the future.'



Discuss.

230 comments:

«Oldest   ‹Older   201 – 230 of 230
Tony said...

Raymond:

When I'm talking about software as a toy, I'm talking about your apparent preference for tools and methods that are full of philosophical goodness (for some not entirely undisputed definitions of "good"), but that don't stand up to the test of real world requirements that I'm professionally familiar with.

"They have the equivalent buy() and sell() methods..."

They don't have even approximately equivalent Portability and AcceptabilityInExchange attributes, however. Concrete commodities generally have low portability in comparison to their market value and limited acceptance in direct exchange. And where portability is high, acceptance in exchange is generally seriously limited by divisibility. Back in the days of precious metal money, how many people ever handled a gold piece in their lives? Usually a gold coin stored more value than the average person ever had at one time in liquid assets.

On the other hand, fiat money is highly portable and universally acceptable in exchange, but at the cost of being authority-dependent and having no utility besides exchange. I would say that that's certainly enough to justify placing it and any derived instruments in their own class.

Raymond said...

Tony:

(Quotes may be shortened to conserve posts.)

"I'm not concerned with the specific implementation of an enumerable (list-type) collection. Modern programming languages optimize the implementation to the task, and/or let the programmer choose which implementation he wants to use."

Let's assume I'm aware of the variety of implementations, as well as how mutable things can be under the hood even for immutable objects of bytecode languages. Let's also assume I've done some work with embedded systems, and can find my way around an instruction set.

I'm talking about overall system design, including languages, databases, and cluster architecture. List comprehension in the language, tail recursion in the compiler, map-reduce in the db cluster - as above, so below. And don't think I'm arguing for strictly pure functional programming, either - it's a tool like any other, and we're disagreeing on the extent of its usefulness.

"Time is never cheap. That's why languages like C# and Java provide char array-based stringbuilding tools, so that you don't have to copy a string every time you append to it. [...]"

And Python 3 has bytearray, which is similar to Stringbuffer. But note that the default string classes in Java, C# and Python are immutable. Mutable char arrays are the special-purpose exceptions, not the rule.

Mutable data structures run into increasing trouble as concurrent execution increases, and the efficiency losses of immutable structures are frequently won back and then some by decreased complexity and increased thread safety in highly concurrent environments.

Everything you're saying is true - for single-threaded execution. But we don't live in that world anymore.

"Similarly, memory is never cheap. [...] Copying arrays, lists, whatever all over the place doesn't help."

What, you think the entire list or hash table gets copied in functional languages? Not a chance. You'd be surprised how lean those implementations can get. (Also, I thought we weren't talking about specific implementation details - or else we'd have to discuss memory paging in various OSes, CPU cache lines, and hash table expansion optimization, too.)

"Serializing data across the network is an entirely different issue from what I'm talking about."

"Storage and memory are not the same thing. I'm talking about memory, which is always a limited resource, and within which you don't have to accept copying on mutation if you don't want to."


I wasn't talking just about serialization or storage, but about the larger system design when dealing with distributed systems (and any other non-uniform non-contiguous memory setup), from blade servers to beowulf clusters. The bigger and more spread-out the system gets, the less it resembles OOP and relational DBs, and the more it tends to use functional programming and key-value stores (these days, anyways) for their greater concurrency resilience.

Raymond said...

Tony, cont'd:

"I'm not sure what the big deal is here. At most you have to check type to figure out what to call. Most of the time, if you've done your design correctly, you don't even have to do that."

Bear in mind this was fifteen years ago, and before I'd gotten into indirect references (which are always a little tricky in C++, at least at first). Like I said, I know how to do it, but it always struck me as odd how seemingly difficult it was to pass functions as data (which, to me, is the most natural thing in the world).

And now that I know much more, I have to say there are many tasks made so much cleaner with lambdas and list comprehensions.

The task in my early case - specifically configuring a menu from a file, the kind of thing you play around with when first learning I/O - can be done in 3-4 (clean) lines of Python, including exception handling. Foreach is my friend - and should be implemented everywhere. It's much harder to do without duck typing and proper reflection.

"Rigid? Yes. But they're rigid precisely because you are trying to make your code more robust and durable. So I really can't in any way accept that "brittle" accusation."

"Because not every -- in fact, not most, in my experience -- of the interfaces you build need or even want unpredictable data. And even at that a good half of the code I right is purely defensive, validating the data tye and going from there to validate that it is within domain and range."


Contracts are supposed to make the supplier's job easier - that's just a shift of debugging responsibility. The state preservation concerns fade (somewhat) when using immutable structures, and the input concerns are loosened with duck typing (and reflection, which usually goes with it).

I say "brittle" because while the code itself may be robust from the supplier's POV, it fails hard (by design) quite easily when using it as a consumer or intermediary. I don't like the "fail hard" philosophy in most cases.

I also say "brittle" because of the extra taxonomic effort required to use the code properly - and if your taxonomy is looser (or simply different), you get screwed rather easily.

Besides - what's the point of contracts if you have to write all that defensive code, anyways?

"Also, with heterogenous data, you don't try to analyze it beyond verifying general type. You rely on the supplying user or application to give you a reasonable amount of metadata in a known format."

"Most of what you need to know, you have to know ahead of time. Even with runtime type checking, you have to know what types to check for and what to do with them given you current state. And knowing what to do WRT state requires you to know at design time what kind of states, both in kind and degree, that you can get yourself into. More on this in a bit."


State gets easier to handle the more things you make immutable. The nice thing about those immutable containers is a marked indifference to state.

And frankly, duck typing is more than just runtime type checking - it also allows for a degree of class indifference without the overhead of ensuring the proper inheritance elsewhere.

"Like I said, we'd get back to this. There is a whole list of portability vs. utility issues that programmers have to deal with. Generally we try to put the unpredictable or at least changeable into data. [...]"

IMHO it's not enough for a well-designed system to allow for user-defined data - it should have some mechanism for user-defined functions, as well. In my case, there are plenty of basic filters and aggregators I can use (on one of my systems, anyways), but the reason I have to dump things into Excel is to get derived data. I'm not expecting Turing-complete embedded languages, but some basic transformations would be nice. But see below.

Raymond said...

"So, if your apps don't give you say, five user-defined attributes on each entity, then I would agree that they're poorly designed. But poor design is not the fault of the tools, which pretty much support anything you want to do, if you want to put the effort into it."

They don't, sadly. 2 of the 3 systems allow for a single comment field. And yes, that part's just poor design. But I see that poor design stemming in some fashion from an obsession with schemas and taxonomy which frequently accompanies OOP and RDBs. Yes, it can be worked around with sufficient effort, but the harder you make it the less common it'll be.

"Also, your software systems presumably persist data in accessible industry standard database formats. What's keeping you from writing local utilities to fill in the holes, leveraging the data you already have in your existing systems?"

Ha! I wish! The closest thing to "industry-standard" is the ordering system based on SAP (shudder - speaking of One True System pretensions), and that's wrapped in an opaque (and nearly useless) web interface. From the catalogue, it's all stored in proprietary blobs. The inventory management/billing/accounting/nose-trimming system has a decent report generator, but everything's manually requested through a terminal interface and spit out as Excel or CSV. Also, there's no input function whatsoever (this I can understand a little better, given the accounting functions, but it's still a PITA).

I've written what I can already, but the functions I want have to do with doing things as opposed to figuring out what to do (ie modification instead of reporting).

"It's generally not through negligence or incompetence that it happens, however. Everything has its limits. The more general a tool you try to build, the less of a fit it makes in many specifics. It's a balancing act. I think you need to appreciate that."

"General utility inherrently conflicts with local completeness. As stated above, it's a balancing act. And, as stated above, there's nothing keeping local programmers from leveraging and extending general systems, as long as they're built with industry standard compnents."


Well, for starters, the extensibility has to be deliberately designed in from the beginning. This can only be done by the designers of the end-user UI, of course, no matter how flexible the core functions are written.

"BTW, I do agree with you about the arrogance of One True System architectures. But that isn't me, or most programmers I know. So let's not broad-brush things."

Understood, and I wasn't trying to imply you personally espouse any such thing. (If I gave that impression, my apologies.) I do, however, think that attitude is something of a natural extension of the principles underlying OOP and RDBs, and especially how both are seen in the corporate coding community (perhaps more in the higher echelons, but still).

"But those industry standards don't come out of nowhere. They come out of what works. OOP works. Relational DB systems work."

And the industry is changing, with even a lot of the big players questioning whether OOP and RDBs work for everything, or even most things.

Raymond said...

"You could accuse me of aesthetic prejudices, but you'd be wrong. I use the tool that works for the job. If I have to write tight, efficient server-side code, I use a compiled, object-oriented language, simply because that's the best tool. I've also written website code for money in PHP, which is essentially a scripting language. (Albeit one with a lot of under-the-hood power thanks to the implementation of the runtime engine and specialized capabilities (e.g. DB access) in C++.) If I have to do stuff client-side, then I use JavaScript. If I can get content from real-time feeds, I go on the web and get it. If I have to store periodic updates in the database, then I do that."

And I do the same (less often than you, since I'm not a programmer by profession). But we were speaking of how we saw the world. You look for what something is, and from there determine what can be done with it, where I look at what can be done with it, and from there determine what it is.

"When I'm talking about software as a toy, I'm talking about your apparent preference for tools and methods that are full of philosophical goodness (for some not entirely undisputed definitions of "good"), but that don't stand up to the test of real world requirements that I'm professionally familiar with."

It's only a toy to me because it's not how I make my living. I doubt I could convince you otherwise, but bear in mind that my aesthetic dislike of OOP stems from my early exposure to them, long before I had any idea of lambda functions. It's always seemed clumsy and verbose and inefficient to me. Just how I see the world.

"They don't have even approximately equivalent Portability and AcceptabilityInExchange attributes, however. Concrete commodities generally have low portability in comparison to their market value and limited acceptance in direct exchange. And where portability is high, acceptance in exchange is generally seriously limited by divisibility. Back in the days of precious metal money, how many people ever handled a gold piece in their lives? Usually a gold coin stored more value than the average person ever had at one time in liquid assets."

"On the other hand, fiat money is highly portable and universally acceptable in exchange, but at the cost of being authority-dependent and having no utility besides exchange. I would say that that's certainly enough to justify placing it and any derived instruments in their own class."


I beg to differ.

Portability for monetary instruments is more, well, abstract, but it's definitely a property. Different forms of money have different required steps to release - cash in hand is more portable than T-bills in an investment account. You could easily use the accessibility (in terms of required advance notification and regulatory considerations) as your basis. Depending on your setup, we can also bring in liquidity constraints (unless we're doing the same with physical commodities, in which case that's another similar property between the two).

AcceptibilityInExchange differs between cash, debit cards, credit cards, cheques, certified cheques, bank drafts, et cetera ad nauseam. No forms are accepted everywhere under every condition (cash, for example, has upper limits due to money laundering regs). So I'd say that's another property shared.

Raymond said...

Phew. Sorry for the textwalls.

Tony said...

Raymond:

"And don't think I'm arguing for strictly pure functional programming, either - it's a tool like any other, and we're disagreeing on the extent of its usefulness."

Oh absolutely. A procedure is still fundamentally a procedure. Our disagreement is over whether data should define the functions associated with it, or whether all data should be equally accessible to functionality. Given that even in the most structured, function-oriented system, you still have to write a lot of type-specific code for it to mean anything, we're engrossed in a dispute over how to factor the world. Or at least that's what I think I'm seeing here.

"And Python 3 has bytearray, which is similar to Stringbuffer. But note that the default string classes in Java, C# and Python are immutable. Mutable char arrays are the special-purpose exceptions, not the rule."

Maybe it's just the nature of what I do, by I hardly consider array-based stringbuilding "special-purpose". In the software I work on, large strings -- dynamically constructed web pages -- are built a few hundred to a few thousand bytes at a time, with as many as a hundred append operations before the result is shipped. At the top level, where the web page is assembled out of components subassembled by modules, one could easily copy 30k from one buffer to the next in order to add a few hundred bytes. On a web server constructing hundreds of thousands of responses a second, that could get to be unmanageable real fast.

"Mutable data structures run into increasing trouble as concurrent execution increases, and the efficiency losses of immutable structures are frequently won back and then some by decreased complexity and increased thread safety in highly concurrent environments."

Well, you see, that's one of the advantages of relational database tools. You can handle concurrency issues by being database-centerd and specifying maximum ACIDity. A good DBMS will even be able to keep memory-resident data concurrent through flexible locking and transparent updates to storage media. So I pretty much just don't worry about it.

If I were working on real time systems, that might be different. But then in RTSs you generally don't want a lot of copying going on, because that eats up time in a system with hard decision deadlines.

"Everything you're saying is true - for single-threaded execution. But we don't live in that world anymore."

Plenty of people live in that world. Most programmers still do, because they're working a data repository model where any subprocess they write is all happening on a single thread. You're taking what a minority of programmers who work on big systems are doing, and acting like all of us out here that work for small and medium sized businesses are doing the same thing.

Tony said...

Raymond:

"What, you think the entire list or hash table gets copied in functional languages? Not a chance. You'd be surprised how lean those implementations can get. (Also, I thought we weren't talking about specific implementation details - or else we'd have to discuss memory paging in various OSes, CPU cache lines, and hash table expansion optimization, too.)"

I really don't know what happens in functional languages. They've always been below my radar. I know from reading that one of the maxims is hand a caller his own copy, don't let him mutate the original. Well, if the caller is iterating through a list and modifying every value, he has to get a copy of the whole list, right?

Also, you know how easy it is to overrun a gigabyte buffer? Characterize the relationship between the approximately 2,500 NWS weather stations and the 36,000 largest cities in the US (by population), using distance as the relationship. Use only 12 bytes per record -- one 32 bit integer each to identify a station and a city, plus one IEEE 32 bit floating point number to record the distance.

That's a real world type of problem that programmers like me have to think through and get right every day.

"I wasn't talking just about serialization or storage, but about the larger system design when dealing with distributed systems (and any other non-uniform non-contiguous memory setup), from blade servers to beowulf clusters. The bigger and more spread-out the system gets, the less it resembles OOP and relational DBs, and the more it tends to use functional programming and key-value stores (these days, anyways) for their greater concurrency resilience."

Once again, you're talking about what the 1,000 PhDs at Google are doing, while ther are hundreds of thousands of us out here just trying to deliver usable data from a database to thousands of individual customers less than 100k at a time, usually in a web browser.

Gotta run, I'll pick this back up in a while.

Tony said...

Raymond:

"What, you think the entire list or hash table gets copied in functional languages? Not a chance. You'd be surprised how lean those implementations can get. (Also, I thought we weren't talking about specific implementation details - or else we'd have to discuss memory paging in various OSes, CPU cache lines, and hash table expansion optimization, too.)"

I really don't know what happens in functional languages. They've always been below my radar. I know from reading that one of the maxims is hand a caller his own copy, don't let him mutate the original. Well, if the caller is iterating through a list and modifying every value, he has to get a copy of the whole list, right?

Also, you know how easy it is to overrun a gigabyte buffer? Characterize the relationship between the approximately 2,500 NWS weather stations and the 36,000 largest cities in the US (by population), using distance as the relationship. Use only 12 bytes per record -- one 32 bit integer each to identify a station and a city, plus one IEEE 32 bit floating point number to record the distance.

That's a real world type of problem that programmers like me have to think through and get right every day.

"I wasn't talking just about serialization or storage, but about the larger system design when dealing with distributed systems (and any other non-uniform non-contiguous memory setup), from blade servers to beowulf clusters. The bigger and more spread-out the system gets, the less it resembles OOP and relational DBs, and the more it tends to use functional programming and key-value stores (these days, anyways) for their greater concurrency resilience."

Once again, you're talking about what the 1,000 PhDs at Google are doing, while ther are hundreds of thousands of us out here just trying to deliver usable data from a database to thousands of individual customers less than 100k at a time, usually in a web browser.

Gotta run, I'll pick this back up in a while.

Tony said...

Raymond:

"Bear in mind this was fifteen years ago, and before I'd gotten into indirect references..."

Well, if you're working in Java-style languages, everything is a reference. Actually, anything that does runtime type discovery works that way, because any variable is just a pointer to a hidden structure of a concrete type that is instantiated at runtime based on the data you are trying to store in it.

"And now that I know much more, I have to say there are many tasks made so much cleaner with lambdas and list comprehensions."

Lambdas have their uses. Microsoft's Linq technology uses them a lot. But I'm not convinced they're the one true funtional model, if you know what I mean. ;-)

"Foreach is my friend - and should be implemented everywhere. It's much harder to do without duck typing and proper reflection."

Foreach is a pretty standard language feature these days. But here's where contracts are necessary. A collection has to provide an iterator that the language can use. In C#, for example, enumerable collections have to provide a concrete implementation of the IEnumerable interface to be called in a foreach loop. I'm pretty sure any language that implements foreach has similar mechanisms and requirements.

"Contracts are supposed to make the supplier's job easier - that's just a shift of debugging responsibility..."

I really can't figure out what you mean by this. The whole point of programming by contract is to keep modularized code from falling apart through interface changes. In meeting a contract, you still have to make it all work on your side of the interface, whether you're developer or the user of a module.

"I say 'brittle' because while the code itself may be robust from the supplier's POV, it fails hard (by design) quite easily when using it as a consumer or intermediary. I don't like the 'fail hard' philosophy in most cases."

Then you must really hate XML. It's explicitly specified that its parsers must always fail if the document is not well-formed, without exception. Now, there are real reasons to be cautious with XML, because it is verbose and otherwise inefficient (e.g. storing numbers in strings). But the all-or-nothing well-formedness constraint is a very useful thing indeed.

Another thing about XML -- producers have to be schema-strict or consumers can't be efficiently automated. That's just another form of programming contract, at the data model level.

"I also say 'brittle' because of the extra taxonomic effort required to use the code properly - and if your taxonomy is looser (or simply different), you get screwed rather easily."

You're going to have to explain that to me.

"Besides - what's the point of contracts if you have to write all that defensive code, anyways?"

You can enforce contracts relatively easily between internal code modules or in data interchange formats for automated production and consumption. But humans will screw you every chance they get. And most of the input nowdays comes from users who are not trained in the requirements of the system, like old school data input operators used to be. So you have to validate, validate, validate.

Tony said...

Raymond:

"State gets easier to handle the more things you make immutable. The nice thing about those immutable containers is a marked indifference to state."

Uhhh...no. You just shift responsibility for correct state from one place to another. Whether you have a type that has all of the code self-contained in class methods, or you have simple structures that are manipulated by outside code, you eventually get to the point where you have to mutate the underlying reference data.

"And frankly, duck typing is more than just runtime type checking - it also allows for a degree of class indifference without the overhead of ensuring the proper inheritance elsewhere."

One still has to ensure that the proper operation is being done on the proper data. For example, lets say we have objects representing Planes, Trains, and Automobiles. You can either have a Plane, Train, or Automobile class each defining its own methods. Or you can have a master function for any given operation, but it has to have "if (Plane) do X", "if (Train) do Y", "if (Automobile) do Z" decision structure.

Or, if you're just looking for exposed methods by name, you still have to know at design time what methods you want to invoke, and what their effects will be, given the (allegedly anonymous) data type. The Go() method for Planes, Trains, and Automobiles, for example, can each respectively have widely varying effects on the state of the environment. So, yay for reflection and all that happy horsesh!t, but you still have to be type conscious.

"IMHO it's not enough for a well-designed system to allow for user-defined data - it should have some mechanism for user-defined functions, as well..."

Well, here's where the data should be in industry standard stores with read-only side access allowed (at least to the people with the proper database credentials). With that you can define any extensions you want that aren't intended to produce side effects. And you can use any language that floats your boat, from Perl to Visual Basic. That should be flexible enough for the local user and safe enough to satisfy the system architect.

Tony said...

Raymond:

"They don't, sadly. 2 of the 3 systems allow for a single comment field. And yes, that part's just poor design. But I see that poor design stemming in some fashion from an obsession with schemas and taxonomy which frequently accompanies OOP and RDBs. Yes, it can be worked around with sufficient effort, but the harder you make it the less common it'll be."

Like I said, it's a balance between general and specific utility. You usually aim to satisfy something like the 95th percentile consensus feature set. That alone can be a lot harder than you seem to think. I think the flavor of the tools in use has very little to do with it.

"Ha! I wish! The closest thing to "industry-standard" is..."

Ugh. I feel for ya, but that sound more management paranoia driven than anything to do with technology choices. A lot of software providers are nutso about not letting the end user see the data modle because they're affraid that they're giving away the proprietary farm if they do.

IMO that means they don't have much value in their system to begin with if they do that. But that's hardly universal. I once helped select and install a burglar/fire alarm monitoring system where the vendor not only supplied the complete data model in a (surprisingly readable) printed spec, but included instructions for how to read live production data using Access and ODBC. (I personally think they were too lazy to write a comprehensive reporting module, but, whatever...it was hihgly useful the way it was.) In another security management system I am familiar with, the data was stored in Btrieve tables and you could get at it with the standard tools if you knew how.

"I've written what I can already, but the functions I want have to do with doing things as opposed to figuring out what to do (ie modification instead of reporting)."

Ummm...you don't want to modify production data with outside tools. If you want to extend it and work on the extended data, that's fine. Goofing around with copies is okay too. But even an expert DBM with all sorts of pro tools can't tell you how or why the data got to be the way it is without knowing the source code of the relevant system functions.

"Well, for starters, the extensibility has to be deliberately designed in from the beginning. This can only be done by the designers of the end-user UI, of course, no matter how flexible the core functions are written."

At a certain level, yes. But extensibility will always be a second-class citizen to consistency and validity. Don't rage against the machine on that account. You'd always be wrong. And, as I stated above, with reasonable amount of access to your own data -- which it is, even if a proprietary software system put it there -- you can manage pretty much all of the extensibility you need.

"Understood, and I wasn't trying to imply you personally espouse any such thing."

I didn't think so. Just marking my territory.

"I do, however, think that attitude is something of a natural extension of the principles underlying OOP and RDBs, and especially how both are seen in the corporate coding community (perhaps more in the higher echelons, but still)."

Sorry, I'm not convinced of that. I think it has way more to do with tightly controlling projects so they get done. The worst thing that can happen is "second system syndrome", where a workable, if general, system, tries to be all things to all people in the next iteration. See Fred Brooks, The Mythical Man-Month for a case study of exactly this thing happening to IBM with the System/360 architecture. It's older than moonwalking.

"And the industry is changing, with even a lot of the big players questioning whether OOP and RDBs work for everything, or even most things."

What big players are those?

Tony said...

Raymond:

"And I do the same (less often than you, since I'm not a programmer by profession). But we were speaking of how we saw the world. You look for what something is, and from there determine what can be done with it, where I look at what can be done with it, and from there determine what it is."

I wanted to say this earlier, but I decided to listen more to what you had to say to be sure: You're erecting a false dichotomy here. Whether you store functionality in class methods or general program functionality, you still have to be type aware, or you're going to do something with side effects you can't predict and handle. If you think you don't that's because you're relying on an implicit contract with the object that it won't go off the reservation if you call its methods.

Not liking explicit contracts and explicit object orientation is nice and all, from some philosophical perspectives, but it all falls apart if all the implicit contract that those conventions are based on aren't followed. Like I said earlier, we're both factoring the same world, and have to live with its facts and relationships.

"It's only a toy to me because it's not how I make my living. I doubt I could convince you otherwise, but bear in mind that my aesthetic dislike of OOP stems from my early exposure to them, long before I had any idea of lambda functions. It's always seemed clumsy and verbose and inefficient to me. Just how I see the world."

I hope I've made this point well enough earlier, but to summarize, you still have to write the code to handle type-specific behaviors somewhere. How you factor that out may in some ways be a personal preference. But I can tell you from the industrial design and maintenance perspective, having types closely bound to their methods is way easier to work with on large code sets.

"I beg to differ..."

Hmmm... Point.

However, fiat money and its instruments simply don't exist except as means of exchange and/or stores of value. Commodities exist whether they are used in exchange or in other ways. From a programming perspective, I would say that fiat money and commodities don't descend from the same parent, but that they each implement some of the same interfaces, with one very important distinction. Fiat Money implements the IMeansOfExchange interface as its primary function, while commodities implement that same interface as an additional function.

Raymond said...

Tony:

(Note: please forgive the achronal order, and I apologize for the textwall to come.)

"...we're engrossed in a dispute over how to factor the world. Or at least that's what I think I'm seeing here.
"


Yeah, that's pretty much how I see it too.

"However, fiat money and its instruments simply don't exist except as means of exchange and/or stores of value. Commodities exist whether they are used in exchange or in other ways. From a programming perspective, I would say that fiat money and commodities don't descend from the same parent, but that they each implement some of the same interfaces, with one very important distinction. Fiat Money implements the IMeansOfExchange interface as its primary function, while commodities implement that same interface as an additional function."

Financial instruments still often have a physical component (cash, cards, share certificates) which, since we shouldn't ignore aesthetic utility (nor crude tangible utility as fuel), many child classes of FinancialInstrument will have at least limited versions of the physical properties that the Commodity class has. I think we should have an ExchangableItem parent class for both, somewhere up the inheritance chain - it's easier to have null properties and methods in the entirely-abstract subclasses than duplicate the relevant code.

"You're erecting a false dichotomy here. [...]"

"Not liking explicit contracts and explicit object orientation is nice and all...Like I said earlier, we're both factoring the same world, and have to live with its facts and relationships."

"I hope I've made this point well enough earlier, but to summarize, you still have to write the code to handle type-specific behaviors somewhere. How you factor that out may in some ways be a personal preference. But I can tell you from the industrial design and maintenance perspective, having types closely bound to their methods is way easier to work with on large code sets."


Let me perhaps clarify a misconception which seems to have snuck in somewhere: I am not, in any way, arguing against types, the concept of types, the necessity of types, or the requirement for type-handling code. Types are pretty fundamental to working anywhere above assembly, and crucial to doing anything interesting. What I'm arguing for, specifically, is (my preference for) a more flexible handling of types, compared to the often-rigid and highly taxonomical approach of OOP.

"Whether you store functionality in class methods or general program functionality, you still have to be type aware, or you're going to do something with side effects you can't predict and handle. If you think you don't that's because you're relying on an implicit contract with the object that it won't go off the reservation if you call its methods."

When speaking of functional programming, there aren't really objects as you're used to, nor do they have methods to call, nor do they have persistent state to change. You're arguing the difference between object-oriented and straight-up imperative programming, which is not what I'm talking about at all.

Raymond said...

"Maybe it's just the nature of what I do, by I hardly consider array-based stringbuilding "special-purpose". [...]"

I'll go into more detail below, but functional languages don't copy everything around each time - only occasionally under the hood, and then they usually use (temporary) mutable structures when appropriate, but that's a compiler/VM matter. And really, when building a string for output, the process should happen only once (when writing to the output buffer). Collect your substrings and concatenate them into the buffer directly, or into a chunk of memory to be handed over to the kernel (depending on exact I/O implementation and platform). Any fucntional language with even halfway decent I/O libraries will be able to do that multiple-concatenation as a single compound operation (usually using tail recursion).

"Well, you see, that's one of the advantages of relational database tools. You can handle concurrency issues by being database-centered and specifying maximum ACIDity."

Ugh. I know how important ACID can be, but setting it to max is the worst thing for performance. It's trying to protect against concurrency, instead of trying to use it to your advantage. And yes, I know how hard it is to make a relational model properly concurrent, but I think that's why key-value stores are gaining some traction among younger web companies (who don't have legacy code and data standing in the way). Take a look at Couchdb or Mongodb for what I'm talking about. (Or Hadoop, but nobody really uses that .)

"Plenty of people live in that world. Most programmers still do, because they're working a data repository model where any subprocess they write is all happening on a single thread."

And therein lies the problem, AFAICT. I think we need more (properly) multiprocess datastores, especially smaller ones, and easier tools for the majority of programmers to add concurrency without having to build everything themselves.

Erlang, as an example, is great in this respect - built-in message-passing and remote calls, an efficient concurrency model (shared-nothing microthreads) and easy event handling. (I need to do more with Erlang, myself.) And this isn't some hacker toy; it was made by a telecom (Ericsson) for use in their network infrastructure. Oh, and did I mention it has hot-swappable code?

For another example, as much as I dislike Apple, I also have to admit that Grand Central (and its attendant extensions of Objective-C) are both powerful (Apple basically put closures in C) and easy to use (the syntax is quick and clear).

"You're taking what a minority of programmers who work on big systems are doing, and acting like all of us out here that work for small and medium sized businesses are doing the same thing."

Things being done on big systems have a distinct tendency to filter down to smaller ones. Everything from multiuser environments to RDBs to multicore to RAID (and I'm not listing even close to everything) gets figured out on big systems, and if it's useful gradually makes its way down to smaller ones.

Raymond said...

"I really don't know what happens in functional languages. They've always been below my radar. I know from reading that one of the maxims is hand a caller his own copy, don't let him mutate the original. Well, if the caller is iterating through a list and modifying every value, he has to get a copy of the whole list, right?"

No, it's usually pass-by-reference, made safe(r) by immutability. Pure functional languages will have their collection data structures based around linked lists, usually, which makes them easy to modify nondestructively (elements replaced in the new list can be accessed by the old container, and are still valid as long as the old container is still in use). Temporary mutable constructs may be used under the hood, depending on the compiler or VM, and are definitely used in tail recursion (which, unfortunately, the Java VM can't do properly, although the CLR can). So if you're mutating each element of a list, you'll read each element of the old one as you're constructing the new (and then the old is garbage-collected if no longer needed).

Hybrids (Python most notably, but also increasingly C#) will usually have a choice of mutable or immutable containers. Both are still passed by reference.

"Uhhh...no. You just shift responsibility for correct state from one place to another. Whether you have a type that has all of the code self-contained in class methods, or you have simple structures that are manipulated by outside code, you eventually get to the point where you have to mutate the underlying reference data."

Not in functional languages, not really. Erlang and Haskell and the Lisp/Scheme families are single-assignment, which means there is no mutated data at the programming level (compiled machine code is another story). "State" means something entirely different. The closest you get is monads, which...well, I'm probably not the best one to explain them. But fundamentally operations which change the state of a variable are explicitly forbidden - everything's constructed in terms of functions and their returned values.

"One still has to ensure that the proper operation is being done on the proper data.[...] Or, if you're just looking for exposed methods by name, you still have to know at design time what methods you want to invoke, and what their effects will be, given the (allegedly anonymous) data type. [...] but you still have to be type conscious."

See above re: type considerations. Not what I'm arguing.

"Also, you know how easy it is to overrun a gigabyte buffer? Characterize the relationship between the approximately 2,500 NWS weather stations and the 36,000 largest cities in the US (by population), using distance as the relationship. Use only 12 bytes per record -- one 32 bit integer each to identify a station and a city, plus one IEEE 32 bit floating point number to record the distance."

Why not cut that by two-thirds and put the distances in a 2d array? Or better yet a 4d array, with cities and stations each presorted by X and Y, and along with a precomputed closest-node hint table you get bidirectional nearest-neighbor search for cheap?

Or if you're really starved for memory, use nested B+ trees or R-trees and store the coordinates, recalculating as needed? (This of course presumes you have those extra cycles to spare, and really need that memory footprint down.)

I obviously can't say for sure what form your data should take, though - I don't know what you're trying to do with it.

Raymond said...

"Once again, you're talking about what the 1,000 PhDs at Google are doing, while ther are hundreds of thousands of us out here just trying to deliver usable data from a database to thousands of individual customers less than 100k at a time, usually in a web browser."

And I hold a certain trepidation about said kilodoctorates and their ability to make tools usable by the majority of programmers. I think that's what Google's trying to do with Go and Dart, but we'll see - their track record of turning neat toys into usable code for the masses is sketchy. It's a matter of not only education, but availability of proper usable tools. (So don't think I'm just saying "catch up, damn you" or anything like that.)

"Well, if you're working in Java-style languages, everything is a reference. Actually, anything that does runtime type discovery works that way, because any variable is just a pointer to a hidden structure of a concrete type that is instantiated at runtime based on the data you are trying to store in it."

All this I know. Specifically, I was using C++. I don't know how much you've worked with C++, but I've found that indirect references can be tricky to learn and syntactically quirky (esp. when you're responsible for managing your own memory).

"Lambdas have their uses. Microsoft's Linq technology uses them a lot. But I'm not convinced they're the one true funtional model, if you know what I mean. ;-)"

I rather like LINQ, actually, and MS implemented the parallel version (PLINQ) in v4. And no, lambdas aren't the One True Anything (despite what the inventors of lambda calculus would claim). They're simply very, very useful IMHO.

"Foreach is a pretty standard language feature these days. But here's where contracts are necessary. A collection has to provide an iterator that the language can use. In C#, for example, enumerable collections have to provide a concrete implementation of the IEnumerable interface to be called in a foreach loop. I'm pretty sure any language that implements foreach has similar mechanisms and requirements."

You don't need contracts for that. In Python, if the class has a next() method, you can iterate it. Yes, you should really make sure your next() method works properly, but it's a simple interface to implement - call next(), get a cookie, repeat as required. And testing for the interface is simple, too, as it's just checking for a next() method using the reflection tools.

Contracts are (in my understanding) essentially a formalism of the interface specs you need to use the code anyways. I'm willing to admit they make automated unit testing easier, but I (and a clutch of other programmers I read) remain suspicious of unit testing's overall worth, especially in high-concurrency environments prone to schrodibugs (and other errors difficult to catch at compile-time).

"I really can't figure out what you mean by this. The whole point of programming by contract is to keep modularized code from falling apart through interface changes. In meeting a contract, you still have to make it all work on your side of the interface, whether you're developer or the user of a module."

I mean that it seems to me that it's nothing more than shuffling validation requirements and state-change risks onto the caller. Preconditions are easier to validate than postconditions, after all (if it were otherwise, we could solve the Halting Problem in the general case).

Raymond said...

"Then you must really hate XML. It's explicitly specified that its parsers must always fail if the document is not well-formed, without exception. Now, there are real reasons to be cautious with XML, because it is verbose and otherwise inefficient (e.g. storing numbers in strings). But the all-or-nothing well-formedness constraint is a very useful thing indeed."

You have no idea of the deep and abiding hatred I harbor for XML and all it represents. Ugly, inefficient, convoluted, fragile, picky, quirky, bloated, verbose, and combined with (Turing-complete!) XSLT, the standard bearer of One True System Syndrome. The funny thing is, somehow it got enough traction to make a halfway-verifiable claim to having achieved that exalted status. It's the fat, pestilent god-emperor of programming, and every day we sacrifice all too many fresh minds to its insatiable hunger.

I dearly wish to burn XML to the ground and stick its angle brackets on pikes across the land, as a warning to others.

(In all seriousness, though, give me three people and I'll show you five opinions on attributes vs. elements. And it's so ugly!)

"Another thing about XML -- producers have to be schema-strict or consumers can't be efficiently automated. That's just another form of programming contract, at the data model level."

There are ways to have a variable level of schema-strictness without the baggage (and with far more efficiency and flexibility) - take Google's Protocol Buffers for a (slightly esoteric) example, which as a format includes sufficient schema generation functionality to automate a good deal of input restrictions, but are still very capable of setting aside unknown or obsolete portions and letting the application handle it (go for a spin in the code, actually - they pull some pretty neat metaprogramming tricks).

""I also say 'brittle' because of the extra taxonomic effort required to use the code properly - and if your taxonomy is looser (or simply different), you get screwed rather easily."

You're going to have to explain that to me."


That comment was directed with OOP overall more than just contracts per se. I've heard it said, and believe to some extent, that OOP is fundamentally infectious - any code you write which touches it ends up resembling it eventually. Interface with objects, and you need things closely resembling objects, which possess the same drawbacks, so you may as well objectify your code anyways. Unless you're dealing with objects that take in and spit out nothing but primitives or serialized interchange formats, you end up structuring your code around that interface; if your taxonomy is different, if you don't bend your inheritances to fit, you get more problems than it's worth to resist.

Also, let me reiterate that I don't like the fail-hard approach. Works well enough at compile time, if all your code is in the same building, but it tends to be opaque and annoying when piecing together code from multiple sources, and it's positively maddening when using external running programs which you can't peek inside.

"You can enforce contracts relatively easily between internal code modules or in data interchange formats for automated production and consumption. [...]"

Thus, I tend to see contracts as wasted effort more tied to internal corporate structure than end-user functionality. Might just be a matter of different environments and resulting perceptions.

"Sorry, I'm not convinced of that. I think it has way more to do with tightly controlling projects so they get done. [...]"

Fair point. It may boil down to an association with OOP and the tightly-controlled project culture which abuses it. I should probably separate those two a bit better in this discussion.

Raymond said...

"What big players are those?"

Oh, the usual: Google (mapreduce, GFS, etc), Amazon (Dynamo, their distributed key-value store which they use for pretty much their entire backend), Facebook (Thrift and Cassandra), Microsoft (LINQ and friends, the backend to Azure), Sun pre-Oracle-buyout (grid computing, ZFS), anyone using Hadoop (the above plus Yahoo, IBM, HP, Microsoft, and a gazillion others), and the ever-expanding roster of "NoSQL" datastores.

"Well, here's where the data should be in industry standard stores with read-only side access allowed (at least to the people with the proper database credentials). With that [...]"

Almost. It's harder to be satisfied with that when key parts of your dataset aren't your own, but pushed onto you on a regular basis. Being able to define, say, item-specific discounts as a function of list price (or cost price, or whatever) is something which has to be allowed internally, when your price list is updated monthly from the Fatherland, and whatever transformations you make by hand are overwritten.

"Like I said, it's a balance between general and specific utility. [...]"

This is why I was presenting my case as, well, only my case. There's always other stuff going on. More below.

"Ugh. I feel for ya, but that sound more management paranoia driven than anything to do with technology choices. [...]"

Oh, that's very much in play here, seeing as how I could probably replace 90% of the system in a month or two with something far better (the other 10% I'd have to have an accounting degree for).

"IMO that means they don't have much value in their system to begin with if they do that. But that's hardly universal. [...]"

They don't, IMO, and while I know it's not universal, it seems to be depressingly common.

"Ummm...you don't want to modify production data with outside tools. If you want to extend it and work on the extended data, that's fine. [...]"

I should clarify - I'm not asking for access to the underlying data so much as either a proper interface to it (maintaining consistency and validity as necessary - like I said, I sort of understand the reluctance to allow unfettered access to accounting data) or sufficient included functionality to implement it within the system.

For example, there's no good reason why my parts pricing has to be in the same database as the accounting records - and in fact, a few of the manipulations I have to do for practical reasons inherently damage the accuracy of the accounting data. There are a whole host of assumptions the designers made about day-to-day operations which don't hold, and I lack any decent mechanism for rectifying them. Then there's the interface problems, but who doesn't have those?

"At a certain level, yes. But extensibility will always be a second-class citizen to consistency and validity. [...]"

True enough in general - in my case, though, it's not that consistency and validity are prioritized (the backend hasn't changed much, if at all, in seven years or more) but an obsession by the supplier with webifying the creaking curses-based terminal interface and implementing what's essentially Excel in a browser (but read-only, of course) at the expense of improving the most common end-user functionality. If you're only looking for reporting, the tools at hand are pretty good - but the underlying business logic is limited to how the system designers thought a car dealership is supposed to work, as opposed to anything resembling how it actually does. I'm sure you're quite familiar with that class of ailment.

Tony said...

Raymond:

"(Note: please forgive the achronal order, and I apologize for the textwall to come.)"

All is forgiven. Please come home for Christmas. ;-)

"Financial instruments still often have a physical component (cash, cards, share certificates) which, since we shouldn't ignore aesthetic utility (nor crude tangible utility as fuel), many child classes of FinancialInstrument will have at least limited versions of the physical properties that the Commodity class has..."

That doesn't redeem legal tender and financial instruments from being created for the sole purpose of exchange or value storage. Commodities have exchange and value storage as a lesser included offense.

And that's about all I can say about that, having said it at least ten times.

"What I'm arguing for, specifically, is (my preference for) a more flexible handling of types, compared to the often-rigid and highly taxonomical approach of OOP."

I think I see what the misunderstanding is here. I don't think in terms of taxonomy. Objects are powerful tools for code organization and thought organization in general. I think a lot of writing and teaching on objects is disproportionately concerned with hierarchy and inherritance, and not enough concerned with objects as organizational tools.

In the code I write, I use inherritance very sparingly. To a degree that's forced on me, due to the heterogenous nature of the data streams I implement into deployable information products. But I think I would anyway, because deep iherritance hierarchies causes more confusion than they're worth. More than two levels of inherritance, I think, and you wind up with too much taxonomic and overhead that you have to handle at runtime. And I think top level parent objects should always be implemented with pure virtual methods, unless you have a very strong reason to suspect that every derived class will have exactly the same behavior as the parent and grandparent.

BTW, if you want to read a very useful and relatively inexpensive book on how objects work, using abstract data types as a conceptual framework (but including chapters on persisting to storage and other real-world applications), I can't recommend this book highly enough:

Data Abstraction and Problem Solving with C++: Walls and Mirrors (Carrano)

This was my textbook in data structures class. And I consider data structures -- no matter how you conceive of them -- foundational to understanding how to program for the real world.

"When speaking of functional programming, there aren't really objects as you're used to, nor do they have methods to call, nor do they have persistent state to change. You're arguing the difference between object-oriented and straight-up imperative programming, which is not what I'm talking about at all."

I get that. Like procedures, data structures are still data structures, no matter how you concieve of them. After all, programs are just algorithms and data structures, so says the One True Prophet Donald.

Tony said...

Raymond:

"I'll go into more detail below..."

I understand all that. But each time you mutate something in the list (using lists as an example, but any large collection will do), you get your own copy of the data until you commit to replacing the original instance. If you mutate everything before committing the change, you have a completely recreated caller-specific list resident in memory with the original list.

Also, if you just leave the original list for garbage collection, you wind up invoking the garbage collector sooner, because you're poluting the memory space with more abandoned objects. Garbage collection is overhead you want to minimize. I've actually watched inefficient programs running and could detect in human-time how often the garbage collection was running.

"Ugh. I know how important ACID can be, but setting it to max is the worst thing for performance."

Hey, let's take it as read that I implicitly mean "consistent with operational requirements" when I make an assertion about practices, k?

Sheesh.

"Take a look at Couchdb or Mongodb for what I'm talking about. (Or Hadoop, but nobody really uses that .)"

According to the couchdb overview doc, writes are designed to fail hard upon version conflicts, and require outside intervention to recommit upon failure. Why should I have to write my own concurrency handling procedures when the vendors are full of byteheaded nerds who love to think about that kind of thing. I just want the thing to store my data and not hiss and spit at me. The various relation al implementations do that for me.

"And therein lies the problem, AFAICT. I think we need more (properly) multiprocess datastores, especially smaller ones, and easier tools for the majority of programmers to add concurrency without having to build everything themselves."

There are plenty of relational database technologies that can handle any magnitude of work. The scaling comes in choosing the right tools. If I'm building small website functionality, I choose something like MySQL, which is cheap and scales fairly well for low access rate apps. If I'm constantly in the data store, then I go with SQL Server or Oracle. That's the whole point of working from a data repository -- you concentrate on edge processes and let the code designed to handle data management grunt work just do it.

"Things being done on big systems have a distinct tendency to filter down to smaller ones..."

Remember what I said about byteheads? Business programmers just don't want to think about that stuff.

Tony said...

Raymond:

"Hybrids (Python most notably, but also increasingly C#) will usually have a choice of mutable or immutable containers. Both are still passed by reference."

In case I haven't made it clear, C# is (and has been for almost seven years) my primary working language. So I fully appreciate the differences between reference and value objects. But they're still both accessed by reference. Whether one is identified by content or by instance is more about design philosophy than it is about plumbing.

"But fundamentally operations which change the state of a variable are explicitly forbidden - everything's constructed in terms of functions and their returned values."

You can program like that in C#. The tools are there. But I wouldn't want to. I *like* being able to pass an object -- or a collection of them -- into a procedure and get back *that* set of instances, suitable mutated, if that's what I want. It's conceptually simple and about as efficient as you can get with memory space.

"Why not cut that by two-thirds and put the distances in a 2d array?..."

It was just an example of how big even highly abstracted real world data sets can get without hardly even trying.

Here's the task: A master process passes me the integer id of a city, and I have to find the current weather from the closest weather station. So, I built in the database a list of weather stations, each identified by an integer. (Supplied to me at creation time by the DBMS.) Then I built a mapping table relating the two types of locations by distance.

Now, it turns out that finding the distance between two points on the surface of a sphere is more resource intensive than you'd think. It's so expensive in fact that we don't even do it when finding geolocations of stuff near location X for map presentations. We calculate once the minimum and maximum latlongs for a box centered on X, and just compare them to places-of-interest latlongs stored in the database. Yes, it's a fudge, and you get everything in a rhomboidal box rather than a circle, but it suffices.

So that's why I have distance precalculated in my mapping table. It's expensive enough to calculate that you just don't want to do it if you have something better. And you do have something better, because distance between two fixed points is a static value -- once you know it, it's not going to change no matter how many times you recalculate it.

That was my first optimization, my second was to limit the weather stations per city (or cities per weather station -- it's symmetrical) to those within 100 miles of each other, since that's about the outside limit for relatable weather in the real world. It works out to just a little over 1 million records.

My third optimization was a (pretty routine, but cool just the same) mapping table pro trick. Put all columns in the primary index. Searches on any column involve a single index seek and automatically return all the data you need without going to the actual record pages. (Since SQL server uses clustered indexes, it may in fact optimize this by constructing nothing but the index -- it wouldn't surprise me.) According to my analysis tools, queries paramaterized by the city id actually take only one third of the time returning data that they take sorting (on distance) the 1-100 returned rows.

BTW, saving precaclulated data in mapping tables is a pretty common thing. With internet phone books we found out that searching through thousands of listings for keywords was killing our servers. So we extracted a keyword list, figured out which rows each keyword could be found in, and saved the result in a mapping table. Then instead of searching through every row of the listings table for a business name containing "Mom's" (for example), we just searched the mapping table for "mom", and returned all related listings.

Tony said...

Raymond:

"Specifically, I was using C++. I don't know how much you've worked with C++, but I've found that indirect references can be tricky to learn and syntactically quirky (esp. when you're responsible for managing your own memory)."

At the college I went to, C++ was the standard teaching language. Now that has value, because it forces the student to learn what's really going on under the hood. Having said that, I absolutely loathed chasing pointers around the world and back to get anything done. Excuse my french, but f*ck that!

I remember programming (conceptually) simple recursion in C++ and wanting to shoot myself to ease the pain. In C# I use recursion when called for without dread or much extra effort.

"I rather like LINQ, actually, and MS implemented the parallel version (PLINQ) in v4."

I had to learn Linq to do maintenance and upgrades on one of our back office applications, because all of the data access was written in Linq to SQL. But where it really helps my productivity is Linq to XML.

"You don't need contracts for that. In Python, if the class has a next() method, you can iterate it. Yes, you should really make sure your next() method works properly, but it's a simple interface to implement - call next(), get a cookie, repeat as required. And testing for the interface is simple, too, as it's just checking for a next() method using the reflection tools."

It's still fundamentally a contract, no matter how simple it is to implement. Remember, I'm talking conceptually here -- you have to get *any* interface right, or you fail.

"Contracts are (in my understanding) essentially a formalism of the interface specs you need to use the code anyways. I'm willing to admit they make automated unit testing easier, but I (and a clutch of other programmers I read) remain suspicious of unit testing's overall worth, especially in high-concurrency environments prone to schrodibugs (and other errors difficult to catch at compile-time)."

I have my own skepticism about *automated* unit testing, because you take as much time writing the test definition as you do the code. If you know what your program is supposed to do and where the domain and range boundaries are, it's effective to just build a simple test harness and run a systematic manual test. But nota bene -- on big projects you still have to run the test outside of overall system context, and for that you do have to build a test harness and use it to work the code out.

"I mean that it seems to me that it's nothing more than shuffling validation requirements and state-change risks onto the caller. Preconditions are easier to validate than postconditions, after all (if it were otherwise, we could solve the Halting Problem in the general case)."

The caller may supply you bad parameter data, even if he means to supply good data. This is really common in interactive web apps. At some level of data consumption you have to write validation code. How you factor that out is context dependent. Maybe you have a whole module that does nothing but validates input data and routs to a re-presentation of the form page (validation failure) or to a substantive response (validation success).

When I worked on internet Yellow Pages, we in fact used that pattern. It allowed us to build a whole family of relatively lightweight response mechanisms, because we knew we had valid data at that level before we went on to build a (resource intensive) listing page. But for other contexts, maybe you have to build validation into the response mechanism, either because you have only one, or because your architecture is more distributed.

In any case, the point of contracts is that, within the context of an interface, the caller supplies all of the necessary data of the right types and quantities, while the callee returns same. The validity of the data is an orthogonal issue.

Tony said...

Raymond:

"You have no idea of the deep and abiding hatred I harbor for XML and all it represents...

I dearly wish to burn XML to the ground and stick its angle brackets on pikes across the land, as a warning to others.

(In all seriousness, though, give me three people and I'll show you five opinions on attributes vs. elements. And it's so ugly!)"


I try not to have any religions about anything to do with computer science or the information systems business. I take XML for what it is -- a lowest common denominator data interchange format. For that I thank the Bog that it was borned. It allows me to use standard tools with minimal configuration to read documents of arbitrary complexity. Actually, that is it's primary practical application.

XSLT is an unmitigated boon to a web feed consumer. If you get your data in XML, you can reform it into a web page using XSLT without having to analyze it in code. This is a good thing when incoming XML might change in format periodically -- which is not all bad, in that it can involve extension and correction -- because you can store the XSLT in the db and edit it at will. No rewriting code on a simple data format change.

WRT attributes and elements, I don't view it as a "vs." issue. They each have their uses. I generally use attributes for non-recurring data up to about 100 characters, simply because it's less verbose and therefore more efficient. Recurring data has to be in elements. Of course you can shave off a few characters per element by putting the value in an attribute with a single character name (e.g. "x='...'") and using the "/>" close. Bigger items should IMO be in element bodies. The "attributes for metadata only" assertion is pure BS.

"That comment was directed with OOP overall more than just contracts per se. I've heard it said, and believe to some extent, that OOP is fundamentally infectious - any code you write which touches it ends up resembling it eventually."

This is IMO a case of "I want your functionality, but I don't want your architectural baggage". But let's look at that from a practical point of view. Back in the Heroic Age, Kernighan and Ritchie handed down the Law: All input and output shall be in plain text. Of course, the reason was that they wanted modularized, pipable utilities with no notion of execution order. The only way to do that was to ensure that input and output was in precisely the same lowest common denominator format. The problem is that one had to know the design of the output to intelligently use it as input in the next module along the pipeline. The other guy's architectural baggage came at you whether you liked it or not. And it was *your* responsibility to deal with it.

So, really, the other guy's architectural baggage is always in your life, no matter how simple you make the interface. Even death will not release you.

"Also, let me reiterate that I don't like the fail-hard approach. Works well enough at compile time, if all your code is in the same building, but it tends to be opaque and annoying when piecing together code from multiple sources, and it's positively maddening when using external running programs which you can't peek inside."

If you can't peek inside, it's better to fail hard. If I'm consuming an XML feed and it's not right, I don't want to know what happened on the other end. I just want to know that I need to re-request. Kind of like couchdb, y'know?

Tony said...

Raymond:

"Thus, I tend to see contracts as wasted effort more tied to internal corporate structure than end-user functionality. Might just be a matter of different environments and resulting perceptions."

Contracts are about modularity and durability. As was repeatedly beat into our heads at school -- and into my individual head subsequently by more than one senior programmer on the job -- it doesn't matter what goes on inside an object as long as there are no externally detectable changes in interface structure or output validity. So say you find a new or better way to do something, or you find out that something needs fixing. With strict contract control of object definition, nobody has to rewrite their software to consume what you have to offer. They just need to load the updated .dll for your module.

Well, with reflection anyway. If you're dealing with C++ .dll's you have my undying sympathy for every system rebuild.

"Oh, the usual: Google (mapreduce, GFS, etc), Amazon (Dynamo, their distributed key-value store which they use for pretty much their entire backend), Facebook (Thrift and Cassandra), Microsoft (LINQ and friends, the backend to Azure), Sun pre-Oracle-buyout (grid computing, ZFS), anyone using Hadoop (the above plus Yahoo, IBM, HP, Microsoft, and a gazillion others), and the ever-expanding roster of "NoSQL" datastores."

These represent a primarily static data view of the world that presumes once a record is constructed, it will mostly (as in 99.9999% of touches) only be read. This may even be valid in a lot of web serving instances. But the huge piles of metadata required to assemble the various component documents into complete web pages still has to reside on relational systems, because it's much more rigidly structured (in order to be usable by response construction code) and is accessed in predictable ways, but in unpredicatable combinations.

Also, the key-value construction of the documents can waste a lot of space in key storage and indexing.

"Almost. It's harder to be satisfied with that when key parts of your dataset aren't your own, but pushed onto you on a regular basis. Being able to define, say, item-specific discounts as a function of list price (or cost price, or whatever) is something which has to be allowed internally, when your price list is updated monthly from the Fatherland, and whatever transformations you make by hand are overwritten."

Here's how I would do it. Presumably every part on your parts list has a part number, right? So your extension consists of a table of discount factors associated with part numbers. If you have programatic access to your underlying data, you just run the contents of that table against the db as soon as possible after the updates are pushed. If you don't have programatic access, you can at least generate a part number-oriented list of discounts by running your discount list against an export of the new price list.

Aside from the specific solutions you might try, once again we seem to have a case of poor design here, not poor tools. It would be a simple task to add the capability to save user-defined discount factors by part number that would *not* be affected by list price updates. Considering how much autonomy dealerships usually have in pricing and inventory, I'm surprised that it's not a standard capability.

"Oh, that's very much in play here, seeing as how I could probably replace 90% of the system in a month or two with something far better (the other 10% I'd have to have an accounting degree for).

They don't, IMO, and while I know it's not universal, it seems to be depressingly common."


They bank on guys like you not having a complete knowledge of all the domains wrapped in one system.

Tony said...

Raymond:

"For example, there's no good reason why my parts pricing has to be in the same database as the accounting records - and in fact, a few of the manipulations I have to do for practical reasons inherently damage the accuracy of the accounting data."

There's one use I can think of right off the top of my head -- being able to readily figure the inventory value for tax purposes and other audits. And yes, changes you make could damage accuracy in reporting, especially if inventory value is figured based on list price and not sale price.

"There are a whole host of assumptions the designers made about day-to-day operations which don't hold, and I lack any decent mechanism for rectifying them. Then there's the interface problems, but who doesn't have those?"

"True enough in general - in my case, though, it's not that consistency and validity are prioritized (the backend hasn't changed much, if at all, in seven years or more) but an obsession by the supplier with webifying the creaking curses-based terminal interface and implementing what's essentially Excel in a browser (but read-only, of course) at the expense of improving the most common end-user functionality. If you're only looking for reporting, the tools at hand are pretty good - but the underlying business logic is limited to how the system designers thought a car dealership is supposed to work, as opposed to anything resembling how it actually does. I'm sure you're quite familiar with that class of ailment."

Incomplete or simply bad domain knowledge can do that to you. It sounds to me like somebody invoked the "there's a template for that" requirements process.

Bog darn them to heck for that.

Raymond said...

Tony:

Since Blogger's become hideously unwieldly, want to switch to email? I'm sure nobody else is even paying attention...

Rick said...

One of these days I gotta settle back and actually *read* this exchange, which looks fascinating but ... demanding.

jollyreaper said...

Can we use an extreme sanction against russian spammers?

Anonymous said...

The Venture Star was boosted to relativistic speeds at 1g by a laser light sail. It used the fusion engines for braking, presumably at a much lower acceleration.

«Oldest ‹Older   201 – 230 of 230   Newer› Newest»