Java 7 Idea: Extensible Strings
In some ways it's a shame that java.lang.String is marked 'final' in Java. If it wasn't final, you could inherit from java.lang.String to create strings that had some extra features, or you could extend with a marker interface to declare that a String has some property, for example it would be neat to be able to track if a String had been:
- Checked for dangerous characters from the web
- Internationalized
- Processed by some system
Allowing marker interfaces like this could lead to some nice extra type-checking:
public SafeString checkForUnsafeCharacters(String s) {
// do some checking and throw or create a SafeString
}
public void process(SafeString s) {
// do something knowing that the developer can't forget
// to check the string before it is passed in
}
Marker interfaces on strings can be a handy way to declare that this string has some properly that normal strings don't.
However there are at least 2 good reasons why java.lang.String is final:
- Security: the system can hand out sensitive bits of read-only information without worrying that they will be altered
- Performance: immutable data is very useful in making thing thread-safe.
So is it possible to have the best of both worlds - to allow marker interfaces and additional properties without breaking assumptions about the immutability of the string itself. And, is this possible without breaking backwards compatibility?
In Javas 1 to 6, java.lang.String is defined like this:
public final class String {
public int length() ...
... // other methods and properties
}
I think we might be able to safely change this to:
public class String {
public final int length() ...
... // everything made final
}
I don't think this breaks backwards compatibility anywhere because it mostly legalizes something that would previously have been a compile error, and the immutability of the core String is as unbroken as it was in Javas 1-6, although obviously the same guarantees could not be made about all extensions.
Declaring finality in this way allows marker interfaces, which might in some cases allow developers to create APIs that enforce some properties on strings that might previously have been forgotten.
Can anyone think of anything that I've overlooked?
Equality, intern and the pool of strings
"Compares this string to the specified object. The result is true if and only if the argument is not null and is a String object that represents the same sequence of characters as this object."Assuming java.lang.String wasn't final, a SafeString could equal a String, and vice versa; because they'd represent the same sequence of characters.
What would happen if you applied intern to a SafeString -- would the SafeString go into the JVM's string pool? The ClassLoader and all objects the SafeString held references to would then get locked in place for the lifetime of the JVM. You'd get a race condition about who could be the first to intern a sequence of characters -- maybe your SafeString would win, maybe a String, or maybe a SafeString loaded by a different classloader (thus a different class).
If you won the race into the pool, this would be a true singleton and people could access your whole environment (sandbox) through reflection and secretKey.intern().getClass().getClassLoader().
Or the JVM could block this hole by making sure that only concrete String objects (and no subclasses) were added to the pool.
If equals was implemented such that SafeString != String then SafeString.intern != String.intern, and SafeString would have to be added to the pool. The pool would then become a pool of <Class, String> instead of <String> and all you'd need to enter the pool would be a fresh classloader.
Re: Java 7 Idea: Extensible Strings
This is a compile-time-only view; the runtime is not affected.
It's a form of mixin, and it's what you're really after, except that you're used to using inheritance for code reuse, rather than other forms.
A problem with inheritance for code reuse (there are many, here's just one) is that it's inflexible. If you want UnsafeString, a subclass of String that adds 'SafeString checkString()', and then you also want Filename, a subclass of String that adds 'boolean exists()', but then you want something that has both checkString and exists, you're a bit stuck, because you're really just trying to add a method to a class, not to make a new class.
Try Scala's views, or Ruby's open classes. Or, better, CLOS, where polymorphic methods aren't actually part of classes.
Re: Java 7 Idea: Extensible Strings
Re: Java 7 Idea: Extensible Strings
What about something like this:
public class Safe<T>
{
public final T value;
public Safe(T value)
{
this.value = value;
}
}
This is the very simple version, but it shows the right direction.
Now you can do what you wanted:
public Safe<String> checkForUnsafeCharacters(String s) {
// ...
}
public void process(Safe<String> s) {
// ...
}
Re: Java 7 Idea: Extensible Strings
Using StringBuilder or StringBuffer is also no good because they are mutable. This tweak to string is still immutable at heart.
Re: Java 7 Idea: Extensible Strings
In the second case - I would expect that you define constructors for MyString as for any other class.
Re: Java 7 Idea: Extensible Strings
Re: Java 7 Idea: Extensible Strings
Re: Java 7 Idea: Extensible Strings
I've never been fond of it because I think it's the jobs of the compiler and IDE to handle that information. Joel points out that the notation gets a bad name from people abusing it for plain type information (I agree that's the compilers job), but I think the same goes for deeper type information too.
Re: Java 7 Idea: Extensible Strings
I didn't get yoir point. Concatenation still wouldn't work:
public void process(SafeString s) {
String a = "abc" + s; // compilation error
String b = "abc" + (String) s; // that would work
}
You would need a cast, as SafeString isn't a String. The power of inheritance comes with polimorfism (which can also be accomplished with interfaces), but if all String methods are final and polimorfism isn't possible, why inherit it?
Just use composition (and delegation if needed), as pointed before. Best to leave String final and avoid inheritance.
Re: Java 7 Idea: Extensible Strings
Re: Java 7 Idea: Extensible Strings
- The issue you raise is (for me) mainly about tools: I would want the strings to be statically checked. One way of doing this is to have a checker with plugins for certain constructor invocations and static method invocations.
new MySQLStatement("sql code"); Tools.createSQLStatement("sql code");With plugins for MySQLStatement and createSQLStatement, one can use the checker to make statically sure the the syntax of the strings is well-formed. Ideally, the syntax-highlighting would also be adjusted. - Note that in functional languages, people are a lot less shy about wrapping a constructor around data (like MySQLStatement above). And this is often a good solution in Java, as well.
- In RDF, strings can be typed (they are called "typed literals" there) via "string"^^XMLDatatypeURI. That is, RDF does not even have integers and booleans, they use "123"^^xsd:integer and "true"^^xsd:boolean This idea looks similar. If annotations are ever allowed to strings, one could closely emulate RDF in this regard.
Re: Java 7 Idea: Extensible Strings
- methods such as createSQLStatement() can be completely transparent markers (i.e., they can have a single String argument and return a String).
- Furthermore: I like the added type-safety of wrapping a String inside another object and don't mind having to dig into into it.
Re: Java 7 Idea: Extensible Strings
For run time, a problem with marking strings is that by their immutable nature, one string is as good as another. E.g. "A" equals "A" equals "A"... and they can be used interchangably. If they are computed strings, they won't have referential equality. But they will be equal.
Marking one string "A" in a special manner breaks this. An inequality is introduced. One string "A" is now different than another string "A". One string "SELECT * FROM FOO" is different than another "SELECT * FROM FOO". But should it be? Wouldn't it be better to mark all "SELECT * FROM FOO" as valid SQL instead of one individual string reference? Following this route leads you away from marker interfaces into caches and sets of vetted strings (like strings known to be valid SQL or not contains funny characters...). This is old school Java and doesn't need a special language extension.
Re: Java 7 Idea: Extensible Strings
Re: Java 7 Idea: Extensible Strings
hash() has a certain contract that is guaranteed to be kept immutable for the entire life of the object - that is why StringBuilder does not have implemented hash(). Maps also rely heavily in the equals() implementation.
Your proposal would yield a different hash() and equals() return value depending on the instant were they are called, which would break the Map implementation we all rely on. My .02 euro :)