Uses for special characters in Java code

Overview

Ever wondered how you can write code like this in Java?
    if( ⁀ ‿ ⁀ == ⁀ ⁔ ⁀ || ¢ + ¢== ₡)

Background

Underscores has long been using in C like language such as Java to distinguish fields and method names.

It is common to see a leading underscore like _field or an underscore in a constant like UPPER_CASE. In Java the $ is also used in class names and accessor method names.

The SCJP has notes which state

Identifiers must start with a letter, a currency character ($), or a connecting character such as the underscore ( _ ). Identifiers cannot start with a number!
This leads to the question; what other connecting characters are there?

What are connecting characters?

A connecting character joins two words together. This page lists ten connecting characters

U+005F LOW LINE _ view
U+203F UNDERTIE view
U+2040 CHARACTER TIE view
U+2054 INVERTED UNDERTIE view
U+FE33 PRESENTATION FORM FOR VERTICAL LOW LINE view
U+FE34 PRESENTATION FORM FOR VERTICAL WAVY LOW LINE view
U+FE4D DASHED LOW LINE view
U+FE4E CENTRELINE LOW LINE view
U+FE4F WAVY LOW LINE view
U+FF3F FULLWIDTH LOW LINE _ view

And if you try the following you may find it compiles.

    int _, ‿, ⁀, ⁔, ︳, ︴, ﹍, ﹎, ﹏, _;

While this is interesting, does it have a use?  Recently I found one.
I have an object which represents a column, and this column has a value for that row. The names are basically the same but I want a notation to distinguish them. So I have something like

    Column<Double>︴tp︴ = table.getColumn("tp", double.class);
    double tp = row.getDouble(︴tp︴);

This way I can see with is tp the column, and which is the value.

Interestingly the currency characters are valid as well.

     for (int i = Character.MIN_CODE_POINT; i <= Character.MAX_CODE_POINT; i++)
        if (Character.isJavaIdentifierStart(i) && !Character.isAlphabetic(i))
            System.out.println(i + " : " + (char) i);

prints

36 : $
95 : _
162 : ¢
163 : £
164 : ¤
165 : ¥
1547 : ؋
2546 : ৲
2547 : ৳
2555 : ৻
2801 : ૱
3065 : ௹
3647 : ฿
6107 : ៛
8255 : ‿
8256 : ⁀
8276 : ⁔
8352 : ₠
8353 : ₡
8354 : ₢
8355 : ₣
8356 : ₤
8357 : ₥
8358 : ₦
8359 : ₧
8360 : ₨
8361 : ₩
8362 : ₪
8363 : ₫
8364 : €
8365 : ₭
8366 : ₮
8367 : ₯
8368 : ₰
8369 : ₱
8370 : ₲
8371 : ₳
8372 : ₴
8373 : ₵
8374 : ₶
8375 : ₷
8376 : ₸
8377 : ₹
43064 : ꠸
65020 : ﷼
65075 : ︳
65076 : ︴
65101 : ﹍
65102 : ﹎
65103 : ﹏
65129 : ﹩
65284 : $
65343 : _
65504 : ¢
65505 : £
65509 : ¥
65510 : ₩

Comments

  1. You can write:

    if( ⁀ ‿ ⁀ == ⁀ ⁔ ⁀)

    ReplyDelete
    Replies
    1. Peter, please have mercy to those guys who might ever need to support that code. They would definitely be at risk of literally tearing their hair out. :-)
      I once worked with a legacy project where one class had two methods named '_' and '__' with one calling another. I'd say that was sort of beyond the ordinary experience.

      Delete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Ok, but I get "The specified message [10527236] was not found. "

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. It looks like its triggering full GCs because you are running close to your direct memory limit. You can reduce the triggering of GCs by manually freeing the direct blocks, but it appears you are using more than you allowed. I suggest reducing the heap size as you are using ~ 5.3 GB. Try a minimum of 5 and a max of 8 GB and you can increase your direct memory size to say 60 GB. Unfortunately memory profilers are not much help when it comes to direct memory. If you really need this much data I would suggest considering a) compacting the memory by using smaller data types b) memory mapped files so some of the data is can be gracefully swapped to disk. btw memory mapped files don't count towards your maximum direct memory.

      Delete
    3. Thanks for the suggestion, it is very helpful :) we found out some library jar that we use were allocating a lot of direct memories, we are trying to fix that.

      Delete
    4. I took the advice b) instead :) I changed 40G direct memory to memory mapped tmpfs file, will see whether it works. Just double check, if I do not specify MaxDirectMemorySize, JVM should use heap size as upper limit for socket's direct memory allocation, right? Also MappedByteBuffer is not accounted for direct memory?

      Delete
  4. Yes, I tried Chinese characters before, and they were compiled. Infact I guess identifiers can be any unicode character, just with a non-digit start.

    ReplyDelete
    Replies
    1. So apart from the keywords, I could generate a java source file completely in a non-english language like Chinese, with class names, all field names all method names and all local variables in Chinese.

      Delete

Post a Comment

Popular posts from this blog

Java is Very Fast, If You Don’t Create Many Objects

Low Latency Microservices, A Retrospective

Unusual Java: StackTrace Extends Throwable