State in Functions: Cache, Lazy + Memoization, and Counter

Dive into functional program as we consider using state in functions. See what use cases might allow for this compromise in functional ideology.

Nawa Manusitthipol

Apr. 26, 18 · Tutorial

Likes (17)

Comment

Save

19.5K Views

Functional programming advocates for programs with no state. Sometimes, however, we need to use state — not for holding data that changes over time like entities — but for other facilitating purposes. There happens to be a way to hold state in Java functions without turning it into an object. This article discusses how to add state to your functions with practical examples: cache, lazy initialization with memoization, and proving indexes.

State for Functions

One of the important principles of Functional Programming (FP) is to avoid having state. State generally makes it more difficult to reason about code, basically, by adding a temporal dimension to the logic. They also represent a side effect — also known as enemy number one of FP. But just like gray, there are fifty shades of functional purity. If having state provides practical benefits AND (big 'AND') they do not cause many problems — such as unpredictability — it is sometimes worth considering it. So, how will we go about having some state in our functions without turning it into an object? By staying a function, the state has no way to unexpectedly leak out, as there is only one way in and out of the function.

Secret Storage Space

There is a secret storage space in Java that most people often overlook ("under-exploited" might be a better word). That storage space is closure constants. Closure constants are the constants that are attached to the lambda that use them. Consider the following code:

public List<Integer> getSmall(int threshold) {
    return list.stream()
        .filter(i -> i <= threshold)
        .collect(toList());
}

threshold is the closure constant for the lambda i -> i <= threshold. From the look of it, threshold does not seem to be special, as it looks just like any other variables (or parameters) that can be accessed by the code in the scope. But in reality, it is attached to the lambda (created together with the lambda) and tags along with the lambda through the adventure in the filter method. If it is still not clear to you, take a look at another code snippet:

private Predicate<Integer> smallerThan(int threshold) {
    return i -> i <= threshold;
}
...

    List<Integer> smalls
    = list.stream()
        .filter(smallerThan(5))
        .collect(toList());
...

As you can see, the lambda created inside smallerThan is returned out of the method. And threshold follows along as a good companion, eager to help answer if any given number is too small. This can be seen as a partial application of a function or some form of currying. Now that you know the secret place, let's make use of it.

Quick Cache

Let say you need to read some information from a database or over the network. This information does not get changed often, so it is OK to cache it for a short period of time such as the life of a request. Using a closure constant to hold the value, we can use it to create a cache. The following code demonstrates how to use it to create a cache for any functions:

public static <I, O> Function<I, O> cacheOf(Function<I, O> inFunction) {
    val cache = new ConcurrentHashMap<I, O>();
    return in -> cache.computeIfAbsent(in, inFunction::apply);
}

NOTE: Lombok's val is used for brevity.

And then you can use the cache function like this ...

@RequestScoped
public class MrScopeBean {
    @Inject
    private CompetitionService competitionService;
    @Inject
    private UserService userService;
    @Inject
    private CityService cityService;

    private final Function<String, String> cityName = cacheOf(cityService::getCityById).andThen(City::getName);
    /** Returns the winner name and the city. */
    public List<String> getWinnerNamesAndCities() {
        return competitionService
            .getWinners().stream()
            .map(userService::getUserById)
            .map(user -> user.getName() + " @ "+ cityName.apply(user.getCityId()))
            .collect(toList());
    }
}

The function returned from the method cacheOf has a cache embedded inside. So multiple calls for the same city id will only trigger the actual service call once.

WARNING: Improperly implemented caching can cause a memory leak!

Before we go on, let's get this warning out first. Caching improperly can cause a nasty memory leak that results in the certain death of your application. So, take great care when using caches in long-running applications. The cache implementation shown here is reasonable for short-to-medium live caching needs. Since the actual cache map is with the cityName function, once the function is garbage collected, the whole map is freed to be garbage collected as well. Thus, putting the cached-function object (cityName) in the request scoped object or bean will limit its live span there — given it is not leaked to outside this bean.

Lazy Initialization With Memoization

Let's say you have an operation that is expensive to perform, and the result of the operation can be reused within an acceptable period of time. The function below might be useful.

public static <T> Supplier<T> lazy(Supplier<T> supplier) {
  val dummy     = new Object();
  val reference = new AtomicReference<Object>();
  return ()->{
    if (reference.compareAndSet(null, dummy)) {
      try {
        val value = supplier.get();
        reference.set((Supplier<T>)(()->value));
      } catch (RuntimeException e) {
        reference.set(e);
      }
    }
    while (!(reference.get() instanceof Supplier)) {
      if (reference.get() instanceof RuntimeException)
        throw (RuntimeException)reference.get();
    }
    return ((Supplier<T>)reference.get()).get();
  };
}

In this function, a supplier to perform and return something is given as a parameter. An AtomicReference is then used to stored an object as adummy to mark that the first processing has started. When the computation is done, a supplier of the value is then set to the reference (replacing the dummy). In the case of exception, the exception will then be assigned to the reference (again, replacing the dummy). If other threads get in while the computation is not done yet, that threads will be in the while loop until the supplier is done or an exception is found as the value. It is done this way to ensure that the supplier only run when needed and only run once no matter how many time it is called (even calls from multiple threads). You can see how I test it here. With all that out of the way, the function might be used like this:

@RequestScoped
public class MrScopeBean {
    @Inject
    private ConfigService configService;
    @Inject
    private UserService userService;
    private final Function<String, User> userById = userService::getById;
    private final Supplier<Configurations> configurations = lazy(configService::readConfigFromFile);
    public boolean isQualify(String id) {
        val userPoint = userById.andThen(User::getLoyatyPoint).apply(id);
        val qualPoint = configurations.get().getQualifiedPoint();
        return userPoint >= qualPoint;
    }
}

In this example code, reading configuration from a file is expensive, so the lazy function is used to make it lazily applied — and its result is memoized. Just like the cache example earlier, the configurations are only stored within the life of if the supplier object configurations.

Counter: Index in a Stream

Let's say you need to have access to the index of elements within the code that uses Java 8's Stream API. For example, given a list, you need to filter it with a certain condition and return the indices of the ones fitting the condition. Here is how you might approach this:

public static <I, O> Function<I, O> withIndex(BiFunction<I, Integer, O> body) {
    val index = new AtomicInteger();
    return input -> body.apply(input, index.getAndIncrement());
}

The function above will give you access to the index of each element so you can make use of it.

public List<Index> getQualifiedWinersIndexes(List<User> winners) {
  return winners.stream()
    .map(withIndex(Pair::new))  // Pair is a tuple of 2
    .filter(pair->pair._1.getLoyatyPoint() > configurations.get().getQualifiedPoint())
    .map(pair._2)
    .collect(toList());
}

In the code above, the winners (Users) are streamed and mapped to a pair using withIndex. withIndex take a BiFunction of index and the element (the winner), then returns a pair holding both the index and the winner. Then, we filter the winner that has a qualified loyalty point. Finally, we collect only the index and return it back.

Or if you want to print the element with its index, you might make one with BiConsumer instead.

public static <I, O> Consumer<I> withIndex(BiConsumer<I, Integer> body) {
    val index = new AtomicInteger();
    return input -> body.accept(input, index.getAndIncrement());
}
public void print(List<User> winners) {
    System.out.println(
        winners.stream()
        .filter(winner->winner.getLoyatyPoint() > configurations.get().getQualifiedPoint())
        .map(withIndex((winner, idx)->idx + ": " + winner.getName()))
        .collect(joining("\n"))
    );
}

The index value is actually the increment of an atomic integer like a counter. So every time the function is called, the index is incremented. Since the function is called once for each iteration, you get an incremental index for each of those. That means the index is not exactly the index of the element in the original list, but the index of the elements that pass through that operation. If the stream passes through filters before going through withIndex, for example, the resulting index will be the index of the elements after the filter, as you see in the second example.

Conclusion

In pure functional programming, functions must not have state. However, there are practical reasons for a function to have state. Closure constants holding mutable objects can be used as private storage of a function. They can be used to store necessary states. Still, proper consideration must be given to ensure benefits without introducing complexity and unpredictability.

What do you think about state in functions? Will you use something like the ideas presented here in your code? Have you used closure constants for something rather than just currying?

Cache (computing) Memoization

Opinions expressed by DZone contributors are their own.

Related

Trending