Winner of the Data Streaming Company of the Year award in 2022, NASA has already been recognized for its advanced use of Apache Kafka®. What the organization has already achieved with data streaming is impressive, but the agency’s unique internal structure means its mission teams are adopting this technology at different paces.
As the Cloud Computing Program manager at the Goddard Space Flight Center, Joe Foster had a unique insight into challenges that individual teams at NASA have faced when adopting new platforms. During this year’s Data in Motion Tour, Confluent Field CTO Will LaForest sat down with Joe to talk about his team’s role in cloud and data streaming adoption across the agency, as well as the importance of immediacy in NASA’s work.
In this Q&A recap, find out how and why the demand for data streaming has grown across the agency.
Will: Although everyone might think they know NASA, there’s a lot more than people might know is going on – from Ames Research Center to JPL and Goddard. So first, can you explain the overall structure of NASA. I think understanding that is really key to realizing how important data streaming has become to the bigger picture.
Joe: Like you mentioned, everybody knows NASA as a brand name, but not everybody understands the ins and outs of how we work. NASA has ten centers plus its headquarters in downtown DC. Our headquarters really operates as the policy shop and the money shop. The centers are where the work is actually done.
And it’s quite diverse, as each center can have multiple physical facilities located across the country. Each center has its own management structure, its own course of capabilities. Being so distributed, we actually have to have distributed IT as well.
The vast majority of the money that flows into IT spending in NASA lives into individual mission directorates. Ultimately, that means that as a government civil servant, my team and I are focused on coming up with new, innovative things to help mission teams with their goals. If we’re not making progress, those teams will just go do things by themselves.
Will: That’s a great segue into my next question. Tell us more about your job – what are your responsibilities as a Cloud Computing Manager at Goddard?
Joe: Around four and half years ago, I became the first full time civil servant at NASA and came on as the cloud computing program manager. Before that, everyone else who worked on cloud computing at NASA was doing small-scale projects. It might have been a data center manager who backed some things up in the cloud or an application developer shop that wanted to work in the cloud.
But overall, there was this high barrier to entry. Users had to get an AWS account and then use vanilla, off-the-shelf tools and figure everything out on their own. So my charge was to find ways to accelerate the mission adoption of cloud. I didn't care what's in the data center. Migrating the data center was not my goal.
Instead, I was supposed to find ways to partner with mission teams and find ways to accelerate their adoption of the cloud. So we built something called the Mission Cloud Platform.
The challenge was that there’s no mandate to use the platform, we had to convince people to opt-in. And three years in, despite starting with no plan and next to no budget, we have 145 projects across all ten NASA centers.
Will: That’s an amazing feat. And I think this idea of lowering the barrier to entry to adopt technology is going to be an ongoing theme of this conversation.
With that in mind, clearly, cloud is of strategic importance to NASA. Your responsibility is to get people into the cloud, but where does that fit into the overall strategy for NASA?
Joe: We're moving large volumes of data to the cloud, right. NASA has an edict to share the science data publicly to the maximum extent possible. To give you an idea of the scale, we now have 60 petals of data in the cloud. And for the month of January 2023 alone, we had almost 4 million compute hours in the cloud.
Getting data into the cloud and sharing that data with our channel partners is a high priority for us.] At the end of the day, we don't want to just do “lift and shift.” We want to actually try to be innovative with things that we do. And data streaming actually feeds into that significantly.
Will: Today NASA uses Confluent’s cloud service for data streaming, but that wasn’t always the case. How did you become aware of Confluent?
Joe: We have a project called the General Coordinates Network (GCN), which happens to be the project that Confluent recognized us for with the award last year. Essentially, GCN is a way to modernize the astronomy community. For the last 2-3 decades, the process by which astronomers communicate with each other was very clunky and had significant time lapses.
The mission team wanted to get to a place where they had a real-time alerting system. The idea was to create a framework where we have a publisher and subscriber model and send out real time alerts whenever somebody sees some supernova or transiting activity in the sky—whether that person is an astronomer, a government employee, or an amateur in their backyard.
For the scientific community, discoverability and the actual heritage of who found something first is incredibly important, especially because different telescopes have different capabilities. With this system, collectively across the globe, everyone could record and stream their observations in real time.
We were able to bring GCN to life, and it’s now been operating for a year and a half. In that short team, it’s already led to enhanced scientific discovery because we’re collecting so many more observations on the same event, right as it happens.
Will: Sometimes it’s actually quite difficult to explain to people just how important immediacy is for specific missions. Often, it’s easier to understand why it matters for things like banking and transportation because people can relate more easily.
With astronomy, they might envision someone taking their time looking into their telescopes and writing papers. But in reality, seconds and even milliseconds makes a huge difference. Can you explain more about the importance of immediacy in this field and how the appetite for data streaming grew within NASA?
Joe: Take a supernova—that’s some of the most transient activity in the sky. It would be there for 30 seconds. So getting that near real time alerting allows us to get as many observations of it as possible. That additional data provides context that’s extremely important, and it's yielding an immediate impact on the community as well.
As for our journey with data streaming, we wanted real-time analytics, and of course that led us to real-time data streaming and Kafka. That’s where we started, and we realized that it was going to be a pain to manage on our own. So we looked at Amazon MSK. Although that managed Kafka service would be a bit better than self-managing open-source Kafka, it still would have required a lot of labor on our part.
Our lives have been a lot better and easier using the managed cloud service from Confluent.
Will: From what I’ve heard, Confluent’s cloud service has actually become really important to NASA broadly, not just for individual projects like GCN. Can you touch on that and why there’s so much internal demand for data streaming?
Joe: Well, when I first met up with the team on the GCN project, I was somewhat surprised to learn that there was a Kafka community of practice that had self-organized inside the agency. At the time, there were already around 15 projects that were all talking to each other about lessons learned and sharing the struggles of using open-source Kafka versus solutions like Confluent Cloud.
So going back to what I explained earlier—my team’s job is to provide services that help the mission teams and ensure they’re not off doing their own thing. It was pretty much a no-brainer decision, so we started looking at how we could design or bring in an enterprise cloud service to fill this need.
The value proposition was strong, and we already had this existing community of practice. We knew we needed to sponsor this project and bring this on as an enterprise service so we could then offer it more broadly across the organization.
Organizations like NASA are realizing the value and accelerating innovation with data streaming every day. To hear how more practitioners and technical leaders are using Confluent to overcome Kafka challenges, sign up to attend the Data in Motion Tour 2023 in a city near you.
And to connect with an even larger community of Kafka practitioners and contributors, register for Current 2023, September 26-27 in San Jose, CA.