PDA

View Full Version : Build your own data-stream mining NSA in the cloud with “FunnelCake”




tangent4ronpaul
07-25-2013, 10:54 PM
http://arstechnica.com/information-technology/2013/07/funnelcake-cloud-service-is-diy-nsa-for-data-stream-miners/
(click through for links and pics)


Build your own data-stream mining NSA in the cloud with “FunnelCake”
BrightContext puts a SQL-like language atop analyitcs tech spawned by Twitter.

by Sean Gallagher - July 25 2013, 2:15pm EST

Big Data
Cloud
Development
Open Source

1

There's more to "big data" than just lots of bits on disks. Some things you can't just store in the raw; others you need to analyze and process before they ever hit a disk. That way they can be acted upon in near-real time—like trying to pick specific communications sessions from the data stream of a network tap into an Internet backbone, for example.

Stream processing, also known as Complex Event Processing, is the real-time querying, analysis, and conversion of information within torrents of live data. It's part of what deep packet inspection and packet capture systems do with network traffic. These tools apply a set of rules to filter out what to capture within Internet packets, then aggregate and transform what's in them into captured content and metadata about the content of those packets. Security event information management (SEIM) systems do the same thing with log files, reports from packet sniffers, and other sources. They pull multiple streams together and analyze them for connections and possible security threat signatures.

But the need for stream processing isn't unique to intelligence organizations like the National Security Agency (NSA). A software-as-a-service startup called BrightContext has released a new version of its stream processing service. BrightContext makes building complex stream processing systems as easy as filling out a Web form and writing a few lines of script. Now, nearly anyone can create their own miniature NSA data center in Amazon's cloud. The language, called FunnelCake, looks a bit like SQL or JavaScript. But unlike SQL, it's designed for never-ending queries against huge streams of data running in parallel.
Riders on the Storm

BrightContext CEO John Funge and CTO Leo Scott started working on the ideas behind the service in 2010. The pair sold their photo-sharing service, Pickle.com, to Scripps Networks (the owners of cable channels like HGTV and Food Network). Next, the duo was pulled into helping Scripps deal with problems involving high volumes of user interactions. "With TV audiences, you have millions of viewers," Funge said. "But as soon as you start having viewers interact, your Web servers take a hit. The problem is, how do you take in all this input and translate it to give stuff back?"

Part of the answer to that question arrived in 2011, when Twitter acquired data analytics startup BackType and then published a big part of BackType's technology as free and open source software. The software, called Storm, is the real-time computing platform used to power analytics and other stream-driven features at Twitter and other companies like Groupon or the Weather Channel. It's also a major piece of the underpinnings of BrightContext's platform.

"Big data" analytics systems that work with large repositories of data, such as Hadoop, typically deploy large numbers of worker apps running in parallel to sort through information and return results. Storm is designed to do a similar thing with data "in flight." It allows developers to build perpetually running worker applications, called "bolts," that can be plugged together in workflows to search, aggregate, and transform raw data into usable information.

Storm has some relatively simple programming interfaces, but the actual management of a cluster of Storm servers and integration into data sources is a bit more complex. BrightContext hides that complexity, including the messaging middleware used to wire together the worker applications and the management system that spawns them and restarts them when they crash. This is all behind a Web dashboard where the pieces run within Amazon's cloud.

The Washington Post and AOL were early customers, and they are still using BrightContext to process audience feedback. The Post used the service to perform analysis on audience interactions during the presidential debates, according to Funge. "Their front-end developers, who are part of the newsroom team, did almost all the work," Funge said. "They were so self-sufficient and grabbed our software developer kit and APIs with such a light amount of support from us that we were wondering if there was an issue with the software. But it was because it was so straightforward to them."
Enlarge / The Washington Post's sentiment tracker application, powered by BrightContext's stream processing, for the first Obama/Romney presidential debate.
Washington Post
A series of funnels

BrightContext's software-as-a-service versions of the Storm bolts are called "QuantChannels," which apply some filtering or transformation to data passing through them, and "ThroughChannels," which simply pipe data from a stream unprocessed into other applications. Both can be set up through the Web dashboard by defining the data elements that will come in through the stream. When they're turned on, their output can be directed back to an analytics application or other worker apps in the BrightContext cloud for further filtering and processing.

There are two kinds of QuantChannels, Funge said. "There's the more straightforward sort of stream processing where the servers running channels can act alone, like filtering." Those sorts of tasks can be configured with little code at all using BrightContext's Web dashboard to configure them. "Then there's the harder kind," he continued. "To get an answer, you need to know something about calculations going on across all the other servers, such as aggregation where you're combining all the results with math."
Enlarge / Documentation of two of the five methods used in FunnelCake to process data streams.
BrightContext

This is where FunnelCake comes in. It can be used to construct complex sets of calculations and aggregations, doing in a handful of lines of script what would normally be days of coding. "We are able to take what would otherwise be thousands of lines of Java code or some other language in Storm and boil it down to a few lines of code in FunnelCake," Funge said.

BrightContext isn't the only player in the cloud market for handling data streams. Axeda, for example, offers a cloud-based platform for handling machine-to-machine data streams and tying them into enterprise information systems. But BrightContext is one of the first companies to build this sort of general-purpose stream processing as a service platform in a public cloud. And it has given anyone with a data stream the ability to mine their own with NSA-like capability on demand.

-t

tangent4ronpaul
07-25-2013, 11:33 PM
Do we have anyone on the forum experienced with this tech, big data, clusters, spiders, etc.?
What with the NSA exposures, something that has been bouncing around in my head has been "why couldn't we do something like that?". No, I don't mean spying on private conversations, I mean spying on public conversations. You just need a stream and ...

So how about some BRAINSTORMING peeps? What could we use this for?

My first thought was twitter. It should be possible to identify the most prolific tweeters and especially re-tweeters. That data could be used to build twitter accounts of people that will RT and have large numbers of followers. The obvious point is to get things going viral. It could further be used to map social networks, our own and the oppositions. The twitterverse is generally composed of celebrities who follow celebrities, politicians who follow politicians and reporters who follow reporters. The forth group is everyone else that follow the first 3 and each other. If we can get a few reliable RT's in each camp, that would give us a lot of reach.

This could also be used monitor how effective our own information distribution is. For example, with the Meetups we are limited in how many groups can be contacted. When contacted, some organizers post info to their groups and some never do this. Knowing which don't is a valuable bit of information so that bottleneck can be routed around.

As a early warning tool. We often hear about things hours before the media reports on it, and we usually find out from twitter. Likewise, links that are a lot better than what the MSM serves up are good to find and spread.

When it comes to other sites, like FB, a number of groups could be monitored to judge level of activity, support, plans, etc.

I've mentioned it before, but more in terms of capturing reporting on RP. I think we should capture news broadcasts and use them to mock on a daily basis the MSM and WH press office. Basically, record each channel along with the subtitles and use the subtitles as metadata. Paired with video offsets on disk. Every news channel is going to be using the exact same talking points/phrases, so just do word counts and those phrases will stick out like a sore thumb. Daily, put together a video collage and twitter it. Make special note, 1984 style when words are changed: Newspeak: "Whistleblower" is now "leaker", everybody change your narrative.

other possibilities... what are your ideas?

-t

HOLLYWOOD
07-25-2013, 11:43 PM
I love funnelcakes, but they're high in carbs, gluten, and fried in oil. bad for you :( ...but not as bad as the NSA, carnivore, etc etc etc

PS: Have you ever datamined NSA contractors corporations and their employees? You would be startled at how many pieces of the puzzle you can put together. Interesting eye openers for those unfamiliar with the intelligence community.

http://www.funcarnivalfoods.com/funnelcake.jpg (http://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&docid=Ong4puhx0m85wM&tbnid=MOHtErTnmZrxjM:&ved=&url=http%3A%2F%2Fwww.funcarnivalfoods.com%2Ffunnel _cakes_michigan_ohio_indiana_illinois.html&ei=WAryUeWwHsqeiALZwYC4Bw&bvm=bv.49784469,d.cGE&psig=AFQjCNEnX_4tEwFQiIIM8ry1XF_KNFWHVA&ust=1374903222490028)

CPUd
07-26-2013, 12:07 AM
Do we have anyone on the forum experienced with this tech, big data, clusters, spiders, etc.?


Yes. I am doing PhD work in this area.

This book is a prerequisite for anyone wanting to understand more about it (scroll down, it is free):
http://nlp.stanford.edu/IR-book/

tangent4ronpaul
07-26-2013, 04:53 AM
http://arstechnica.com/information-technology/2012/09/big-brother-meets-big-data-the-next-wave-in-net-surveillance-tech/
http://www.brightcontext.com/
http://storm-project.net/

-t

puppetmaster
07-26-2013, 06:54 AM
http://arstechnica.com/information-technology/2012/09/big-brother-meets-big-data-the-next-wave-in-net-surveillance-tech/
http://www.brightcontext.com/
http://storm-project.net/

-t

Wow I see some good uses for this....money maker.

tangent4ronpaul
07-26-2013, 04:21 PM
http://www.analytictech.com/ucinet/
http://www.analytictech.com/downloadnd.htm
http://pajek.imfm.si/doku.php
http://www.hsdl.org/?view&did=732086

-t

Carson
07-26-2013, 06:13 PM
I've been using STD SWIM. (Solar Terrestrial Dispatch-Solar Terrestrial Weather Monitor)

It is the newer version of STD.

It was designed to keep track of solar weather events to aid being able to watch the Northern and Southern Lights.

What it does is allow you to enter a website, like a webcam, and it will capture and save the information for later viewing. You can also play back the captured information like a video or GIF. You can track a lot of things besides solar weather.

http://www.spacew.com/swim/index.html



The latest version is Proplab-Pro version 3. I haven't tried it but it can be fascinating watching the information download as it comes in on SWIM. You can set it to show new images.

http://www.spacew.com/proplab/index.html



Another way to do sort of the same thing cheaper used to be Web Cam Watcher. I'm not sure if it's around any more.

Yes! Why yes it is. (Pretty creepy when they come out and say, (Note : Webcam Watcher is NOT Spyware or Adware) but I've use it and found it fun.

http://www.webcam-watcher.com/

Carson
07-26-2013, 06:17 PM
Thread music?


The Alan Parsons Project- Eye in the Sky

http://www.youtube.com/watch?v=NNiie_zmSr8