<< How to Protect a JSON or Javascript Service | Home | New DWR Release - 2.0 RC4 >>

Java 7 Idea: Extensible Strings

In some ways it's a shame that java.lang.String is marked 'final' in Java. If it wasn't final, you could inherit from java.lang.String to create strings that had some extra features, or you could extend with a marker interface to declare that a String has some property, for example it would be neat to be able to track if a String had been:

  • Checked for dangerous characters from the web
  • Internationalized
  • Processed by some system

Allowing marker interfaces like this could lead to some nice extra type-checking:

public SafeString checkForUnsafeCharacters(String s) {
  // do some checking and throw or create a SafeString
}

public void process(SafeString s) {
  // do something knowing that the developer can't forget
  // to check the string before it is passed in
}

Marker interfaces on strings can be a handy way to declare that this string has some properly that normal strings don't.

However there are at least 2 good reasons why java.lang.String is final:

  • Security: the system can hand out sensitive bits of read-only information without worrying that they will be altered
  • Performance: immutable data is very useful in making thing thread-safe.

So is it possible to have the best of both worlds - to allow marker interfaces and additional properties without breaking assumptions about the immutability of the string itself. And, is this possible without breaking backwards compatibility?

In Javas 1 to 6, java.lang.String is defined like this:

public final class String {
  public int length() ...
  ... // other methods and properties
}

I think we might be able to safely change this to:

public class String {
  public final int length() ...
  ... // everything made final
}

I don't think this breaks backwards compatibility anywhere because it mostly legalizes something that would previously have been a compile error, and the immutability of the core String is as unbroken as it was in Javas 1-6, although obviously the same guarantees could not be made about all extensions.

Declaring finality in this way allows marker interfaces, which might in some cases allow developers to create APIs that enforce some properties on strings that might previously have been forgotten.

Can anyone think of anything that I've overlooked?

Tags :


Re: Java 7 Idea: Extensible Strings

Whats wrong with coding to the CharSequence interface?

Equality, intern and the pool of strings

Extending String would play havoc with equals and intern. JavaDoc says equals
"Compares this string to the specified object. The result is true if and only if the argument is not null and is a String object that represents the same sequence of characters as this object."
Assuming java.lang.String wasn't final, a SafeString could equal a String, and vice versa; because they'd represent the same sequence of characters.

What would happen if you applied intern to a SafeString -- would the SafeString go into the JVM's string pool? The ClassLoader and all objects the SafeString held references to would then get locked in place for the lifetime of the JVM. You'd get a race condition about who could be the first to intern a sequence of characters -- maybe your SafeString would win, maybe a String, or maybe a SafeString loaded by a different classloader (thus a different class).

If you won the race into the pool, this would be a true singleton and people could access your whole environment (sandbox) through reflection and secretKey.intern().getClass().getClassLoader().

Or the JVM could block this hole by making sure that only concrete String objects (and no subclasses) were added to the pool.

If equals was implemented such that SafeString != String then SafeString.intern != String.intern, and SafeString would have to be added to the pool. The pool would then become a pool of <Class, String> instead of <String> and all you'd need to enter the pool would be a fresh classloader.

Equality, intern and the pool of strings

I hit "Reply" instead of "Add a comment". That should have been a reply to the original blog.

Re: Java 7 Idea: Extensible Strings

The only benefit you get from making checkString part of String is that you can call it using "blah".checkString(), rather than checkString("blah"). Scala provides 'views' of objects, so you can say that all Strings in a certain file, or package, have the checkString method, so you can do "blah".checkString().

This is a compile-time-only view; the runtime is not affected.

It's a form of mixin, and it's what you're really after, except that you're used to using inheritance for code reuse, rather than other forms.

A problem with inheritance for code reuse (there are many, here's just one) is that it's inflexible. If you want UnsafeString, a subclass of String that adds 'SafeString checkString()', and then you also want Filename, a subclass of String that adds 'boolean exists()', but then you want something that has both checkString and exists, you're a bit stuck, because you're really just trying to add a method to a class, not to make a new class.

Try Scala's views, or Ruby's open classes. Or, better, CLOS, where polymorphic methods aren't actually part of classes.

Re: Java 7 Idea: Extensible Strings

Have you considered CharSequence? This is an interface (hence non-final) which is implemented by java.lang.String and could be extended and/or marked. With some dynamic proxy magic, you could write a function to add all appropriate markings. E.g. The input would be a CharSequence, but given a String it'd return a CharSequence/SafeString combo object; given a SafeString return the same SafeString right back; do internationalization ...

Re: Java 7 Idea: Extensible Strings

The Managing Gigabytes for Java project (http://mg4j.dsi.unimi.it/) has a matured mutable extendable string implementation. http://mg4j.dsi.unimi.it/docs/it/unimi/dsi/mg4j/util/MutableString.html

Re: Java 7 Idea: Extensible Strings

What about something like this:

public class Safe<T>
{
  public final T value;

  public Safe(T value)
  {
     this.value = value;
  }

}

This is the very simple version, but it shows the right direction.
Now you can do what you wanted:

public Safe<String> checkForUnsafeCharacters(String s) {
  // ...
}

public void process(Safe<String> s) {
  // ...
}

Re: Java 7 Idea: Extensible Strings

I think it's a shame that java.util.Date is mutable, but final String class is OK. Use Charsequence or StringBuilder. I can agree on MutableString implementing Charsequence but the java.lang.String should always be final.

Re: Java 7 Idea: Extensible Strings

Lots of people misunderstanding the point here. I want the implementation of String so using CharSequence is not going to work - that's the interface without the implementation.

Using StringBuilder or StringBuffer is also no good because they are mutable. This tweak to string is still immutable at heart.

Re: Java 7 Idea: Extensible Strings

It's a culture issue. Anything that should be simple is made hard in Java land. Maybe the authors of String class have a strong reason.

Re: Java 7 Idea: Extensible Strings

Nice idea, however, what about these scenarios: String abc = "abc"; MyString def = new MyString("def"); result = abc + def; Furthermore: MyString abc = "aaa"; I guess we will also need operator overloading.

Re: Java 7 Idea: Extensible Strings

In the first case of adding strings together, since MyString is-a String, there should be no issue, however since this is done by the compiler - there could be work to support it in javac.

In the second case - I would expect that you define constructors for MyString as for any other class.

Re: Java 7 Idea: Extensible Strings

About the above first example: abc+def is straightforward, but what about def+abc or def+def? Now let's invent MyString2 ghi = new MyString2("ghi"); What about def+ghi regarding the type of "result"? If I think (+) as a method call which always returns String that's not too useful. Really, maybe some javac logic could solve this.

Re: Java 7 Idea: Extensible Strings

I really don't understand what you are trying to do. Why can't you simply wrap a string in another object for example? That would give you any flexibility you need - no need to change the String class!

Re: Java 7 Idea: Extensible Strings

If you wrap a string, then the wrapped object is not a string, so you can't pass the objects on without digging into the object.

Having marker interfaces allows us to treat special strings as strings, so concatenation still works for example.

Re: Java 7 Idea: Extensible Strings

I think the principle argument against extending strings for the reasons you've outlined, is that the "state" (internationalized, normalized against unsafe characters, etc) could be better expressed via apps_hungarian notation, e.g, "String safeLoginStr = normalize(loginStr);". That way, just LOOKING at the code, you know which parts are using the unsafe string, and which are using the safe string. To be tremendously cliche, here's a link to an article Joel Spolsky wrote on the subject. I'm not trying to give you the Slashdot answer (Q:"How do I do this in PHP?" ,A:"Switch to Ruby!") but rather, just point out that there's an alternative solution to the problem, and given that there IS a solution, the Java implementors at Sun might be of the impression that the risks of letting you extend String (whatever their extent may be) might outweigh the benefits.

Re: Java 7 Idea: Extensible Strings

Yeah - If you're into the polish thing, then it's a great solution.

I've never been fond of it because I think it's the jobs of the compiler and IDE to handle that information. Joel points out that the notation gets a bad name from people abusing it for plain type information (I agree that's the compilers job), but I think the same goes for deeper type information too.

Re: Java 7 Idea: Extensible Strings

I didn't get yoir point. Concatenation still wouldn't work:

public void process(SafeString s) {
  String a = "abc" + s; // compilation error
  String b = "abc" + (String) s; // that would work
}

You would need a cast, as SafeString isn't a String. The power of inheritance comes with polimorfism (which can also be accomplished with interfaces), but if all String methods are final and polimorfism isn't possible, why inherit it?

Just use composition (and delegation if needed), as pointed before. Best to leave String final and avoid inheritance.

Re: Java 7 Idea: Extensible Strings

Eh? SafeString is-a String - that's the point of making String non-final so you can inherit from it.

Re: Java 7 Idea: Extensible Strings

your SafeString wasn't only a marker interface?

Re: Java 7 Idea: Extensible Strings

Looks like a great idea. I've wanted to extend String in the past, and if I'd had this I could have done it safely. I guess thing thing to do is to patch the JDK, and release the patch so people can have a play see if there are further issues beyond what can be discussed on the blog thread.

Re: Java 7 Idea: Extensible Strings

Great idea! A few observations:
  • The issue you raise is (for me) mainly about tools: I would want the strings to be statically checked. One way of doing this is to have a checker with plugins for certain constructor invocations and static method invocations.
    new MySQLStatement("sql code");
    Tools.createSQLStatement("sql code");
    
    With plugins for MySQLStatement and createSQLStatement, one can use the checker to make statically sure the the syntax of the strings is well-formed. Ideally, the syntax-highlighting would also be adjusted.
  • Note that in functional languages, people are a lot less shy about wrapping a constructor around data (like MySQLStatement above). And this is often a good solution in Java, as well.
  • In RDF, strings can be typed (they are called "typed literals" there) via "string"^^XMLDatatypeURI. That is, RDF does not even have integers and booleans, they use "123"^^xsd:integer and "true"^^xsd:boolean This idea looks similar. If annotations are ever allowed to strings, one could closely emulate RDF in this regard.

Re: Java 7 Idea: Extensible Strings

Follow-up to my previous post:
  • methods such as createSQLStatement() can be completely transparent markers (i.e., they can have a single String argument and return a String).
  • Furthermore: I like the added type-safety of wrapping a String inside another object and don't mind having to dig into into it.

Re: Java 7 Idea: Extensible Strings

Are you talking about static (development-time) checking of SQL statements in a Java IDE? Maybe this could be done (better?) with annotations. Instead of "marking" a string as containing SQL, annotate it as such.

For run time, a problem with marking strings is that by their immutable nature, one string is as good as another. E.g. "A" equals "A" equals "A"... and they can be used interchangably. If they are computed strings, they won't have referential equality. But they will be equal.

Marking one string "A" in a special manner breaks this. An inequality is introduced. One string "A" is now different than another string "A". One string "SELECT * FROM FOO" is different than another "SELECT * FROM FOO". But should it be? Wouldn't it be better to mark all "SELECT * FROM FOO" as valid SQL instead of one individual string reference? Following this route leads you away from marker interfaces into caches and sets of vetted strings (like strings known to be valid SQL or not contains funny characters...). This is old school Java and doesn't need a special language extension.

Re: Java 7 Idea: Extensible Strings

Have you thought about using a wrapper class? Just make the String a gettable attribute so you can still use String methods on it - but keep any other attributes you require as further boolean/integer values. These could be reset by any method which adds to or replaces the String.

Re: Java 7 Idea: Extensible Strings

Making String mutables would break Maps.

hash() has a certain contract that is guaranteed to be kept immutable for the entire life of the object - that is why StringBuilder does not have implemented hash(). Maps also rely heavily in the equals() implementation.

Your proposal would yield a different hash() and equals() return value depending on the instant were they are called, which would break the Map implementation we all rely on. My .02 euro :)


Add a comment Send a TrackBack